Yamato-Security / hayabusa

Hayabusa (隼) is a sigma-based threat hunting and fast forensics timeline generator for Windows event logs.
GNU General Public License v3.0
2.07k stars 183 forks source link

New `sort-csv` command #1295

Open YamatoSecurity opened 4 months ago

YamatoSecurity commented 4 months ago

Since --low-memory-mode can not sort or remove duplicate entries it would be nice to have a command that can do this in post-processing. sort-csv: sort and remove duplicate detections

@hitenkoku Since you did the -X, --remove-duplicate-detections, could I ask you to do this one? We need to keep the CSV header at the top and sort by timestamp, then remove duplicate entries but ignore the EvtxFile column if there is one because sometimes different .evtx files will have the same records. (backup files, etc...) I think this is the same logic as -X, --remove-duplicate-detections, is that correct?

Options:

YamatoSecurity commented 3 months ago

@hitenkoku In order to sort without using a lot of memory, it might be good to import the CSV data into a temporary sqlite database, sort the sqlite database and then export the results out back to CSV. What do you think?

YamatoSecurity commented 3 months ago

So that an investigator can collect many CSV files from endpoints with velociraptor we should also support directory input. -f, --file <FILE> Input file and -d, --directory <DIRECTORY> Input directory