Recommended ceiling on the number of the monitored domains?

Hi Alex,

Consider two things:

Baselining/tuning false positives. Extraneous and legitimate hidden elements will show up in your results. Malspider uses an alexa list to tune most of these out. Other FPs should be tuned out in the Admin panel under "Custom Whitelist" - substrings are accepted. I personally prefer to load domains in smaller chunks, vet the alerts, and then whitelist what I need to. I quickly ran a scan against 300 domains and needed to create 13 whitelist entries. It didn't take much time.
of pages to scan beyond the homepage. Malspider, by default, scans 20

pages beyond the homepage. This was a feature added last month after popular demand. The PAGES_PER_DOMAIN variable can be set to whatever you feel is best (and can scan an entire domain), but I think having a limit like 20 prevents bottlenecks. It also protects you against cases where phantomjs may hang - this seems to be a common problem among people using phantomjs to do a lot of crawling. In my research, crawling more than 20 pages beyond the homepage had no benefit. It also limits your footprint. The only time I would crawl a full domain is if I was scanning my orgs web presence or if i was intentionally monitoring client domains or something... basically non-research purposes.

I test with about 1100 domains and use a proxy service to hide the origin of my traffic. On my home internet connection I was able to scan all 1100 domains (20 pages beyond the home page for each domain) in about 90min. 6GB of data was stored in the database. Scanning significantly more domains (or pages per domain) is certainly possible in a 24 hour period.

PS - A new version will be coming out very soon. The new version will support yara signatures and immediate page analysis (instead of post-processing data).

Thanks, James

On Tue, Sep 20, 2016 at 8:32 AM, Alex Shatberashvili < notifications@github.com> wrote:

My current project might involve monitoring around 1200 small-to-medium sized domains. Other than the database size, are there any bottlenecks I should consider?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ciscocsirt/malspider/issues/11, or mute the thread https://github.com/notifications/unsubscribe-auth/AR0QEJDq3VF0Z1rmPV3QG0icjKKe0fbHks5qr9JRgaJpZM4KBkUP .

ciscocsirt / malspider

Recommended ceiling on the number of the monitored domains? #11

of pages to scan beyond the homepage. Malspider, by default, scans 20