DFIRKuiper / Kuiper

Digital Forensics Investigation Platform
736 stars 110 forks source link

Greatly improve get_by_status performance #101

Closed heck-gd closed 1 year ago

heck-gd commented 1 year ago

This PR moves file filtering logic for global file parsing statuses from Python to the database. Our mongodb contains a 6-digit figure of parsed files, and the fact that this method downloaded the entire collection before parsing each file caused huge parsing delays.

The returned dict has a slight structural difference (machine_id instead of entire machine), but the only user of this function is wait_memory_free, which only looks at the file_size anyway.