ciscocsirt / malspider

Malspider is a web spidering framework that detects characteristics of web compromises.
BSD 3-Clause "New" or "Revised" License
420 stars 78 forks source link

Are jobs really cancelled? #2

Closed r3comp1le closed 7 years ago

r3comp1le commented 8 years ago

python malspider_django/manage.py manage_spiders Canceling all outstanding jobs canceled job 261c094631af11e6872f79b4a0de6dd8 for project ' malspider ' canceled job 8976429e31b011e6872f79b4a0de6dd8 for project ' malspider '

ps -aux /usr/bin/python -m scrapyd.runner crawl full_domain -a _job=261c094631af11e6872f79b4a0de6dd8 -a org=1708 ..... /usr/bin/python -m scrapyd.runner crawl full_domain -a _job=8976429e31b011e6872f79b4a0de6dd8 -a org=1707 ....

jasheppa5 commented 8 years ago

Hi,

Jobs are scheduled once every 24 hours. If a job is still running after 24 hours there is a good chance the job is hung or the connection/response time is very slow - for that reason Malspider cancels any hanging jobs after 24 hours to make room for new crawls.

If a job still appears to be running even after the manage_spiders command calls the scrapyd API cancel function, then I'm not sure where the problem lies. Jobs should be terminated. Are you still seeing jobs with ps -aux several minutes after cancelation? if there is a bug in the scrapyd API the only other way to cancel jobs immediately is to make a direct call to the underlying scrapyd sqllite database.

-James

On Tue, Jun 14, 2016 at 7:14 AM, r3comp1le notifications@github.com wrote:

python malspider_django/manage.py manage_spiders Canceling all outstanding jobs canceled job 261c094631af11e6872f79b4a0de6dd8 for project ' malspider ' canceled job 8976429e31b011e6872f79b4a0de6dd8 for project ' malspider '

ps -aux /usr/bin/python -m scrapyd.runner crawl full_domain -a _job=261c094631af11e6872f79b4a0de6dd8 -a org=1708 ..... /usr/bin/python -m scrapyd.runner crawl full_domain -a _job=8976429e31b011e6872f79b4a0de6dd8 -a org=1707 ....

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ciscocsirt/malspider/issues/2, or mute the thread https://github.com/notifications/unsubscribe/AR0QEBQghSwZHItPV4y8JTtcb-3NyGk_ks5qLdZbgaJpZM4I0yba .

r3comp1le commented 8 years ago

Yes, the jobs are still there after it says it cancelled them. Going on +1 hour since cancellation attempt.

r3comp1le commented 8 years ago

Very odd, still running even after:

curl http://localhost:6802/cancel.json -d project=malspider -d job=261c094631af11e6872f79b4a0de6dd8

{"status": "ok", "prevstate": "running"}

jasheppa5 commented 8 years ago

Just curious, does the job still show as active after you manually "kill" the process?

I'm going to look into this further and see if I can reproduce the issue and attach a debugger to the spider.

On Tue, Jun 14, 2016 at 7:49 AM, r3comp1le notifications@github.com wrote:

Very odd, still running even after:

curl http://localhost:6802/cancel.json -d project=malspider -d job=261c094631af11e6872f79b4a0de6dd8 {"status": "ok", "prevstate": "running"}

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ciscocsirt/malspider/issues/2#issuecomment-225731926, or mute the thread https://github.com/notifications/unsubscribe/AR0QEJSzjWJCPQPHrU_vzonmKrc0Yuy5ks5qLd6UgaJpZM4I0yba .

r3comp1le commented 8 years ago

For some reason kill wasnt doing anything. I eventually rebooted. Appreciate the help.

r3comp1le commented 8 years ago

I ran into this issue again tonight. I went from 1 org for testing, to 10 orgs and ran spider. 5 jobs finished, while 5 were still running. The elements and pages werent incrementing anymore either.

r3comp1le commented 8 years ago

Check again this morning and now have 11 running jobs

jasheppa5 commented 7 years ago

I added a few lines of code to force a timeout after crawling a domain for "X" amount of time. Early testing seems to indicate this fixes the issue with the spider hanging on certain domains.