The file is locked: nio:./crawler/crawlstore/mvstore/crawler//mvstore [1.4.196/7]

SaschaHeyer commented 5 years ago

Hi Pascal,

we have a cronjob which is starting the crawling in a scheduled manner. Sometimes we get the following error and the crawler does not start the next scheduled crawl run.

I found this issue https://github.com/Norconex/collector-http/issues/336 and you mention it was already fixed, any further suggestions?

java.lang.IllegalStateException: The file is locked: nio:./crawler/crawlstore/mvstore/crawler//mvstore [1.4.196/7]
        at org.h2.mvstore.DataUtils.newIllegalStateException(DataUtils.java:765)
        at org.h2.mvstore.FileStore.open(FileStore.java:173)
        at org.h2.mvstore.MVStore.<init>(MVStore.java:347)
        at org.h2.mvstore.MVStore.open(MVStore.java:395)
        at com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStore.<init>(MVStoreCrawlDataStore.java:57)
        at com.norconex.collector.core.data.store.impl.mvstore.MVStoreCrawlDataStoreFactory.createCrawlDataStore(MVStoreCrawlDataStoreFactory.java:49)
        at com.norconex.collector.core.crawler.AbstractCrawler.createCrawlDataStore(AbstractCrawler.java:243)
        at com.norconex.collector.core.crawler.AbstractCrawler.doExecute(AbstractCrawler.java:204)
        at com.norconex.collector.core.crawler.AbstractCrawler.startExecution(AbstractCrawler.java:184)
        at com.norconex.jef4.job.AbstractResumableJob.execute(AbstractResumableJob.java:49)
        at com.norconex.jef4.suite.JobSuite.runJob(JobSuite.java:355)
        at com.norconex.jef4.suite.JobSuite.doExecute(JobSuite.java:296)
        at com.norconex.jef4.suite.JobSuite.execute(JobSuite.java:168)
        at com.norconex.collector.core.AbstractCollector.start(AbstractCollector.java:131)
        at com.norconex.collector.core.AbstractCollectorLauncher.launch(AbstractCollectorLauncher.java:95)
        at com.norconex.collector.http.HttpCollector.main(HttpCollector.java:74)

Best regards Sascha

essiembre commented 5 years ago

Which version are you using? Is that file shared by more than one running instance? Are you relaunching a crawler in the same JVM? If yes, it is possible java did not release the file handle before you requested it again.

SaschaHeyer commented 5 years ago

Hi Pascal,

version 2.8.1
the file is not shared by more than one running instance
our cron job is running the following command sh collector-http.sh -a start -c project/crawler-configuration.xml

Best regards Sascha

essiembre commented 5 years ago

Can you share a config that reproduces it?

This could happen if the crawler tries to open the crawlstore while there is still a lock on it. Is it possible when you start the crawler that the other one has not yet terminated? You should normally see a different error if so, but I am looking for patterns.

So maybe forcing a delay between two runs could fix this? If you restart after that exception, will it work eventually or does it always fail after that exception is thrown? If it works fine after a while, it would tend to confirm there is sometimes not enough delay between two runs and the lock from the previous run is still there.

One thing to try is to make sure one execution does not start until the previous one is terminated and/or modify the collector launch script to insert a wait time before starting the JVM.

There might be a way to fix this natively, but we need to find out what causes this first.

SaschaHeyer commented 5 years ago

Hi Pascal,

The cronjob itself will restart and the crawler then work as expected (sometimes it requires a few runs till the crawler starts again). As recommended I added a delay of 15 minutes to the cronjob but it can happen that the crawler runs longer and again is very close to the next crawl run. (Which leads to the the same error message).

I tried to find the pattern in the log files, here you can see the time frames are overlapping. The file is locked exception is thrown while the crawler is still running. Though the log file itself is written after the previous crawl run is completed.

Logfile 1 - /2019/07/01/logs/201907010545060000__obfuscated.log

2019-07-01 05:30:08 INFO - Starting execution.
2019-07-01 05:49:21 INFO - Running: END

Logfile 2 - /2019/07/01/logs/201907010549210895__logistics.log

2019-07-01 05:45:48 ERROR - Execution failed
java.lang.IllegalStateException: The file is locked: nio:./obfuscated/crawlstore/mvstore/obfuscated//mvstore [1.4.196/7]

How can I make sure one execution does not start until the previous one is terminated? (When using cronjob).

Best regards Sascha

essiembre commented 5 years ago

Normally you do not have to do anything since your new process would throw an exception and you should have in your logs something that starts with "JOB SUITE ALREADY RUNNING". In other words, it should detect a job is already running and abort (letting the current job finish normally).

I do not get why you have this error instead.

If you have your own code around the execution, you can have a look at

com.norconex.jef4.status.JobSuiteStatusSnapshot#newSnapshot(File)

The file is the suite .index file that gets created under your progress folder. Once you have that snapshot object created, you can invoke #getRoot() to obtain a IJobStatus instance. From that last object, you can use the methods only to get the status of the last/current job.

SaschaHeyer commented 5 years ago

Hi Pascal,

Google meanwhile reduced the error most likely to an error within Norconex itself. See https://github.com/google-cloudsearch/norconex-committer-plugin/issues/15

Any ideas how we can solve this behavior?

Best regards Sascha

essiembre commented 5 years ago

Since this one appears to be related to #634, shall we close as a duplicate?

SaschaHeyer commented 5 years ago

Hi @essiembre yes absolutely

Norconex / crawlers

The file is locked: nio:./crawler/crawlstore/mvstore/crawler//mvstore [1.4.196/7] #615