asepaprianto / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Deleting crawl storage folder after crawling? #243

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create a temp dir for crawl storage
2. After controller finished its job, delete crawl folder and its contents
3. Fail with the exception frontier\00000000.jdb: The process cannot access the 
file because it is being used by another process.

What is the expected output? What do you see instead?

What version of the product are you using?
3.5

Please provide any additional information below.

I'm using Controller-Crawler example.

Path tempDirectory = Files.createTempDirectory("betis-crawl");
String crawlStorageFolder = tempDirectory.toString();

...
controller.start(MyCrawler.class, numberOfCrawlers);
...

controller.shutdown();

Path start = tempDirectory;
Files.walkFileTree(start, new SimpleFileVisitor<Path>() {
    @Override
    public FileVisitResult visitFile(Path file,
        BasicFileAttributes attrs) throws IOException {
        Files.delete(file);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult postVisitDirectory(Path dir, IOException e)
        throws IOException {
        if (e == null) {
            Files.delete(dir);
            return FileVisitResult.CONTINUE;
        } else {
            // directory iteration failed
            throw e;
        }
    }
});

Original issue reported on code.google.com by ihsancif...@gmail.com on 21 Nov 2013 at 8:49

GoogleCodeExporter commented 9 years ago
Not a bug or feature thus moved to the forum

Original comment by avrah...@gmail.com on 17 Aug 2014 at 5:20