Closed SaschaHeyer closed 5 years ago
Which version are you using? Is that file shared by more than one running instance? Are you relaunching a crawler in the same JVM? If yes, it is possible java did not release the file handle before you requested it again.
Hi Pascal,
sh collector-http.sh -a start -c project/crawler-configuration.xml
Best regards Sascha
Can you share a config that reproduces it?
This could happen if the crawler tries to open the crawlstore while there is still a lock on it. Is it possible when you start the crawler that the other one has not yet terminated? You should normally see a different error if so, but I am looking for patterns.
So maybe forcing a delay between two runs could fix this? If you restart after that exception, will it work eventually or does it always fail after that exception is thrown? If it works fine after a while, it would tend to confirm there is sometimes not enough delay between two runs and the lock from the previous run is still there.
One thing to try is to make sure one execution does not start until the previous one is terminated and/or modify the collector launch script to insert a wait time before starting the JVM.
There might be a way to fix this natively, but we need to find out what causes this first.
Hi Pascal,
The cronjob itself will restart and the crawler then work as expected (sometimes it requires a few runs till the crawler starts again). As recommended I added a delay of 15 minutes to the cronjob but it can happen that the crawler runs longer and again is very close to the next crawl run. (Which leads to the the same error message).
I tried to find the pattern in the log files, here you can see the time frames are overlapping. The file is locked
exception is thrown while the crawler is still running. Though the log file itself is written after the previous crawl run is completed.
Logfile 1 - /2019/07/01/logs/201907010545060000__obfuscated.log
2019-07-01 05:30:08 INFO - Starting execution.
2019-07-01 05:49:21 INFO - Running: END
Logfile 2 - /2019/07/01/logs/201907010549210895__logistics.log
2019-07-01 05:45:48 ERROR - Execution failed
java.lang.IllegalStateException: The file is locked: nio:./obfuscated/crawlstore/mvstore/obfuscated//mvstore [1.4.196/7]
How can I make sure one execution does not start until the previous one is terminated? (When using cronjob).
Best regards Sascha
Normally you do not have to do anything since your new process would throw an exception and you should have in your logs something that starts with "JOB SUITE ALREADY RUNNING". In other words, it should detect a job is already running and abort (letting the current job finish normally).
I do not get why you have this error instead.
If you have your own code around the execution, you can have a look at
com.norconex.jef4.status.JobSuiteStatusSnapshot#newSnapshot(File)
The file is the suite .index file that gets created under your progress folder. Once you have that snapshot object created, you can invoke #getRoot()
to obtain a IJobStatus
instance. From that last object, you can use the methods only to get the status of the last/current job.
Hi Pascal,
Google meanwhile reduced the error most likely to an error within Norconex itself. See https://github.com/google-cloudsearch/norconex-committer-plugin/issues/15
Any ideas how we can solve this behavior?
Best regards Sascha
Since this one appears to be related to #634, shall we close as a duplicate?
Hi @essiembre yes absolutely
Hi Pascal,
we have a cronjob which is starting the crawling in a scheduled manner. Sometimes we get the following error and the crawler does not start the next scheduled crawl run.
I found this issue https://github.com/Norconex/collector-http/issues/336 and you mention it was already fixed, any further suggestions?
Best regards Sascha