CERT-Polska / mquery

YARA malware query accelerator (web frontend)
GNU Affero General Public License v3.0
413 stars 77 forks source link

feat: new JobFile model to store UrsaDB iterator batch files #420

Open mickol34 opened 1 month ago

mickol34 commented 1 month ago

Your checklist for this pull request

What is the current behaviour?

Currently query to UrsaDB returns iterator and it's length which is then passed to batcher and YARA parser to unpack.

What is the new behaviour? Now iterator is immediately popped after querying and several JobFile objects are created which store batches of files to execute YARA on.

Test plan

App should work the same way for various queries and rules. In case of other system failure during executing YARA there should remain JobFile instances (since those are deleted upon successful YARA execution).

Closing issues

fixes #381

msm-cert commented 3 weeks ago

BTW. this fails because QueryResult table does not exist in the DB. You need to create alembic migration for it (see also recent PR about enum rework). I didn't re-review the rest of the code yet.

mickol34 commented 3 weeks ago

Referring to #381 I'm not sure if ursadb.pop() is dead code, if it's used in all_indexed_files and all_indexed_names functions. If there are any other cleanups and/or tests to add, please comment below for me to fix them.

msm-cert commented 3 weeks ago

Ugh, this is tough with regards to RAM usage. Nothing wrong with this PR by itself, but I'll have to think what to suggest. Let's keep it open as a draft for now. Sorry!

By the way, in the current form storing the files in the database (QueryResult objects) is a bit pointless, since nobody ever uses them, right? :upside_down_face: