Yamato-Security / takajo

Takajō (鷹匠) is a Hayabusa results analyzer.
https://yamato-security.github.io/takajo/
GNU General Public License v3.0
64 stars 4 forks source link

Add low-memory-mode option to support larger input files #154

Open einarssonm opened 3 months ago

einarssonm commented 3 months ago

Would it be possible to add a --low-memory-mode option for Takajo, similar to the recently added option in Hayabusa?

I often process Windows Event Forwarding (WEF) logs (ForwardedEvents.evtx), which are 20 GB or larger. This is how I process the .evtx file(s) with Hayabusa, which results in a ~7 GB .jsonl file:

.\hayabusa-2.15.0-win-x64\hayabusa-2.15.0-win-x64.exe json-timeline --JSONL-output --EID-filter --UTC --enable-unsupported-rules --visualize-timeline --profile verbose --low-memory-mode --no-wizard --exclude-tag sysmon --file .\ForwardedEvents.evtx --output json-timeline.jsonl

...and this is how I process the Hayabusa output with Takajo:

.\takajo-2.5.0-win\takajo.exe automagic --timeline .\json-timeline.jsonl --output takajo-results

Takajo consumes quite a lot of memory:

image

... and finally fails with an "out of memory" error:

image

When Takajo crashes the output location contains a scriptblock-logs directory with 39.800 files.

einarssonm commented 3 months ago

Feel free to remove the "Bug" label, since it would rather be an "Enhancement".

YamatoSecurity commented 3 months ago

@einarssonm Thanks for reporting this to us. Since Takajo is processing smaller files (than the original .evtx) I figured it wouldn't be necessary to have a low-memory-mode, but if people are experiencing crashes, we will look into it. However, since Takajo needs to sorts results, it might be difficult to do this without having a high-spec backend database... which would defeat the purpose as you would end up needing much more memory.

Do crashes only occur with the just the automagic command, or with other commands as well? Since automagic is performing many commands at once, this might be the cause of so much memory usage. In that case, we might be able to make a low memory mode that does not process things in parallel but it would be much slower.