RITA slow/ not working on ~500GB 24 hour dataset

kyleEeeEEeeee commented 2 years ago

Hello,

Still doing some testing of the tool. We've seen great results so far. We are currently trying to run it against the following set of logs (massive network):

24 hours: conn.log: 438GB dns.log: 93GB ssl.log: 23GB http.log: 5.8GB

It's not working like it has with smaller data sets, but I think it could be a resource issue on our end...but just curious if you all are aware of a limit for what RITA can handle. Is there a certain size where it starts becoming unreliable?

Thanks in advance :)

kyleEeeEEeeee commented 1 year ago

hello,

I am now realizing that I have the same bug/issue that @Zalgo2462 is working. Is there a specific unique hostname/ip count we need to stay under to have this work? Like what Zalgo changed from 200 to 150? We are trying to figure out if there are customer networks that will simply be too large to run the tool on. Thanks :)

Zalgo2462 commented 1 year ago

Hello, I believe that would be larger than any dataset I have personally tested RITA with. I imagine that RITA may take 24 hours or longer to process that much data. If the data can be cleanly partitioned (by Zeek sensor, by internal subnets, or by hour), I would recommend splitting the data and running RITA on each partition separately.

In general, the FQDN beaconing analysis will take the longest out of the different analysis modules. I have had to disable this analysis in the RITA config file in the past when working with large datasets.

In the ticket https://github.com/activecm/rita/issues/759, the error causes RITA to skip the beaconFQDN analysis, so this is likely related to but different from the issues you are seeing.

Please post a copy of the output from RITA you are seeing. From there we can see where RITA is getting stuck and if we can't help it along somehow.

Additionally, the files in /var/lib/rita/logs might help us figure out what is going wrong.

Thank you for your time and interest in the project!

kyleEeeEEeeee commented 1 year ago

Hello,

So sorry about the delay. We realized that the issue seemed to be related to cpu/ram resources. We could get through that giant 570 gig batch if we upped our resources on the test VM. The split method to rolling database method you recommended also worked, but outputting/writing the show-beacons-fqdn to csv (to put into splunk) wouldn't work afterwards. So I'm thinking disabling FQDN on huge data sets may be the way to go, like you had said. When it was failing we would get something like this (the error with the 16mb):

activecm / rita-legacy

RITA slow/ not working on ~500GB 24 hour dataset #757