Closed r3comp1le closed 7 years ago
Hi,
Jobs are scheduled once every 24 hours. If a job is still running after 24 hours there is a good chance the job is hung or the connection/response time is very slow - for that reason Malspider cancels any hanging jobs after 24 hours to make room for new crawls.
If a job still appears to be running even after the manage_spiders command calls the scrapyd API cancel function, then I'm not sure where the problem lies. Jobs should be terminated. Are you still seeing jobs with ps -aux several minutes after cancelation? if there is a bug in the scrapyd API the only other way to cancel jobs immediately is to make a direct call to the underlying scrapyd sqllite database.
-James
On Tue, Jun 14, 2016 at 7:14 AM, r3comp1le notifications@github.com wrote:
python malspider_django/manage.py manage_spiders Canceling all outstanding jobs canceled job 261c094631af11e6872f79b4a0de6dd8 for project ' malspider ' canceled job 8976429e31b011e6872f79b4a0de6dd8 for project ' malspider '
ps -aux /usr/bin/python -m scrapyd.runner crawl full_domain -a _job=261c094631af11e6872f79b4a0de6dd8 -a org=1708 ..... /usr/bin/python -m scrapyd.runner crawl full_domain -a _job=8976429e31b011e6872f79b4a0de6dd8 -a org=1707 ....
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ciscocsirt/malspider/issues/2, or mute the thread https://github.com/notifications/unsubscribe/AR0QEBQghSwZHItPV4y8JTtcb-3NyGk_ks5qLdZbgaJpZM4I0yba .
Yes, the jobs are still there after it says it cancelled them. Going on +1 hour since cancellation attempt.
Very odd, still running even after:
curl http://localhost:6802/cancel.json -d project=malspider -d job=261c094631af11e6872f79b4a0de6dd8
{"status": "ok", "prevstate": "running"}
Just curious, does the job still show as active after you manually "kill" the process?
I'm going to look into this further and see if I can reproduce the issue and attach a debugger to the spider.
On Tue, Jun 14, 2016 at 7:49 AM, r3comp1le notifications@github.com wrote:
Very odd, still running even after:
curl http://localhost:6802/cancel.json -d project=malspider -d job=261c094631af11e6872f79b4a0de6dd8 {"status": "ok", "prevstate": "running"}
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ciscocsirt/malspider/issues/2#issuecomment-225731926, or mute the thread https://github.com/notifications/unsubscribe/AR0QEJSzjWJCPQPHrU_vzonmKrc0Yuy5ks5qLd6UgaJpZM4I0yba .
For some reason kill wasnt doing anything. I eventually rebooted. Appreciate the help.
I ran into this issue again tonight. I went from 1 org for testing, to 10 orgs and ran spider. 5 jobs finished, while 5 were still running. The elements and pages werent incrementing anymore either.
Check again this morning and now have 11 running jobs
I added a few lines of code to force a timeout after crawling a domain for "X" amount of time. Early testing seems to indicate this fixes the issue with the spider hanging on certain domains.
python malspider_django/manage.py manage_spiders
Canceling all outstanding jobs canceled job 261c094631af11e6872f79b4a0de6dd8 for project ' malspider ' canceled job 8976429e31b011e6872f79b4a0de6dd8 for project ' malspider 'ps -aux
/usr/bin/python -m scrapyd.runner crawl full_domain -a _job=261c094631af11e6872f79b4a0de6dd8 -a org=1708 ..... /usr/bin/python -m scrapyd.runner crawl full_domain -a _job=8976429e31b011e6872f79b4a0de6dd8 -a org=1707 ....