LAW-Unimi / BUbiNG

The LAW next generation crawler.
http://law.di.unimi.it/software.php#bubing
Apache License 2.0
85 stars 24 forks source link

Gracefully recover crawl when unexpectedly stopped #27

Closed kasparas12 closed 3 years ago

kasparas12 commented 3 years ago

Hello, I am playing with the crawler and starting the crawl by issuing this command:

nohup java -cp bubing-0.9.15.jar:lib/* -server -Xss256K -Xms20G -XX:+UseNUMA -Djavax.net.ssl.sessionCacheSize=8192 \
        -XX:+UseTLAB -XX:+ResizeTLAB -XX:NewRatio=4 -XX:MaxTenuringThreshold=15 -XX:+CMSParallelRemarkEnabled \
        -verbose:gc -Xloggc:gc.log -XX:+PrintGCDetails \
        -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 \
        -Djava.rmi.server.hostname=<hostname> \
        -Djava.net.preferIPv4Stack=true \
        -Djgroups.bind_addr=<hostname> \
        -Dlogback.configurationFile=bubing-logback.xml \
        -Dcom.sun.management.jmxremote.port=9999 -Dcom.sun.management.jmxremote.rmi.port=9998 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false \
        it.unimi.di.law.bubing.Agent -h <hostname> -P eu.properties -g eu agent -n 2>err >out

Everything seems to run fine, crawl is started as background process. Since I am using AWS Spot instances to run crawls, they might be taken out. Volume data is preserved, though, but machine is swapped to another one so the crawl is interrupted. I tried to imitate this by simply killing java program process and issue the upper command without -n option to continue crawl. In crawl logs then I get these errors:

2021-04-26 07:25:21,631 1588 ERROR [main] i.u.d.l.b.f.Frontier - Trying to restore state from snap directory crawl-digital/frontier/snap, but it does not exist or is not a directory 2021-04-26 07:25:21,632 1589 ERROR [Distributor] i.u.d.l.b.f.Distributor - Unexpected exception java.lang.NullPointerException: null at it.unimi.di.law.bubing.frontier.Distributor.run(Distributor.java:134) 2021-04-26 07:25:21,775 1732 ERROR [MessageThread] i.u.d.l.b.f.MessageThread - Unexpected exception java.lang.NullPointerException: null at it.unimi.di.law.bubing.frontier.MessageThread.run(MessageThread.java:54)

So I guess somehow that snap directory is not being created. Anyone knows why this might happen? This does not let me continue the crawl

Thank you.

vigna commented 3 years ago

On 26 Apr 2021, at 08:34, Kasparas Taminskas @.***> wrote:

Hello, I am playing with the crawler and starting the crawl by issuing this command:

Everything seems to run fine, crawl is started as background process. Since I am using AWS Spot instances to run crawls, they might be taken out. Volume data is preserved, though, but machine is swapped to another one so the crawl is interrupted. I tried to imitate this by simply killing java program process and issue the upper command without -n option to continue crawl. In crawl logs then I get these errors:

2021-04-26 07:25:21,631 1588 ERROR [main] i.u.d.l.b.f.Frontier - Trying to restore state from snap directory crawl-digital/frontier/snap, but it does not exist or is not a directory 2021-04-26 07:25:21,632 1589 ERROR [Distributor] i.u.d.l.b.f.Distributor - Unexpected exception java.lang.NullPointerException: null at it.unimi.di.law.bubing.frontier.Distributor.run(Distributor.java:134) 2021-04-26 07:25:21,775 1732 ERROR [MessageThread] i.u.d.l.b.f.MessageThread - Unexpected exception java.lang.NullPointerException: null at it.unimi.di.law.bubing.frontier.MessageThread.run(MessageThread.java:54)

So I guess somehow that snap directory is not being created. Anyone knows why this might happen? This does not let me continue the crawl

You have to stop cleanly to get a snapshot of the current state. If you kill the process you cannot restart it.

Ciao,

                seba