ciscocsirt / malspider

Malspider is a web spidering framework that detects characteristics of web compromises.
BSD 3-Clause "New" or "Revised" License
420 stars 78 forks source link

Jobs maxing at 10,000 #14

Open wrinkl3 opened 7 years ago

wrinkl3 commented 7 years ago

Malspider appears to stop launching new jobs once the number of the completed ones reaches 10,000 (as seen here). It still consumes a large amount of RAM. Restarting the server seems to temporary solve it, as the "completed jobs" counter resets to zero.

jasheppa5 commented 7 years ago

Hey Alex,

New jobs are still launched, but scrapyd (according to the default settings) will only keep 10k finished jobs in the launcher. This means once you reach 10k, the oldest finished jobs will be removed from the launcher to keep the # at <= 10k. You can change this setting in the malspider/scrapyd.conf file. Look for "finished_to_keep" and change it to whatever you like. scrapyd ships with the default value of 100, but I changed it to 10k for Malspider. You'll also see a "jobs_to_keep" setting. This relates to how many log files to store.

Ideally we should have a counter in the database to track the # of jobs completed, or get rid of the "finished jobs" status altogether. I'm leaning towards getting rid of it since there are plenty of spidering stats and info about outstanding and running jobs. Do you agree with that?

-James

On Wed, Nov 30, 2016 at 3:34 AM, Alex Shatberashvili < notifications@github.com> wrote:

Malspider appears to stop launching new jobs once the number of the completed ones reaches 10,000 (as seen here http://i.imgur.com/itENqeu.png). It still consumes a large amount of RAM. Restarting the server seems to temporary solve it, as the "completed jobs" counter resets to zero.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ciscocsirt/malspider/issues/14, or mute the thread https://github.com/notifications/unsubscribe-auth/AR0QEOV8t6aiGhxq7awm7em257U5oCxPks5rDTUngaJpZM4K_7kH .