New `sort-csv` command - Githubissues

YamatoSecurity commented 4 months ago

Since --low-memory-mode can not sort or remove duplicate entries it would be nice to have a command that can do this in post-processing. sort-csv: sort and remove duplicate detections

@hitenkoku Since you did the -X, --remove-duplicate-detections, could I ask you to do this one? We need to keep the CSV header at the top and sort by timestamp, then remove duplicate entries but ignore the EvtxFile column if there is one because sometimes different .evtx files will have the same records. (backup files, etc...) I think this is the same logic as -X, --remove-duplicate-detections, is that correct?

Options:

-f, --file <FILE> Input file
-d, --directory <DIRECTORY> Input directory
-o, --output <FILE> Output file
-C, --clobber Overwrite files when saving

YamatoSecurity commented 3 months ago

@hitenkoku In order to sort without using a lot of memory, it might be good to import the CSV data into a temporary sqlite database, sort the sqlite database and then export the results out back to CSV. What do you think?

YamatoSecurity commented 3 months ago

So that an investigator can collect many CSV files from endpoints with velociraptor we should also support directory input. -f, --file <FILE> Input file and -d, --directory <DIRECTORY> Input directory

Yamato-Security / hayabusa

New `sort-csv` command #1295