activecm / rita-legacy

Real Intelligence Threat Analytics (RITA) is a framework for detecting command and control communication through network traffic analysis.
GNU General Public License v3.0
2.51k stars 362 forks source link

Only maintain one cid's worth of max scores in the host collection #801

Closed Zalgo2462 closed 1 year ago

Zalgo2462 commented 1 year ago

This PR changes the summarize / aggregation phase of the IP beacons, proxy beacons, SNI beacons, and unique connection analyses to perform a total roll-up of the currently imported data for each host rather than a roll-up for the data that was just imported. This allows us to store a single copy of the roll-ups in the host collection for each internal host instead of a record per chunked import. In turn, we no longer have to worry about old roll-ups falling out of sync with new data. Closes #800

Testing: I've tested this PR against the logs linked in #800 and ensured that the problem is fixed there. I am currently testing to ensure that we don't have any regressions on other datasets.