CERT-Polska / mquery

YARA malware query accelerator (web frontend)
GNU Affero General Public License v3.0
413 stars 77 forks source link

Store job files in the database instead of ursadb iterators #381

Open msm-cert opened 9 months ago

msm-cert commented 9 months ago

That way we can stop (ab)using iterators (and maybe even deprecate them in ursadb - they're a bit problematic in case of failed jobs).

And with postgres it won't be a problem.

msm-cert commented 1 month ago

During a team meeting I've mentioned that some metadata is still in Redis instead of Postgres. Looks like I was wrong, and this was (the last thing) fixed in Feb.

But there are still some things that are not in Redis but should be. This includes the list of matched files.

In the query_ursadb function, we first select files into ursadb iterator (by using the into iterator {query} statement in the query), and then in the run_yara_batch function we "pop" files from the iterator and run yara on them.

Instead, we should run a normal query, save all prefiltered files into the database, and then read unprocessed files from the database instead of from ursadb.

This should be a separate table (not Match) with just job Id and file path. It should work a bit like a task queue and after processing files should be removed.

In short, the roadmap: