CERT-Polska / drakvuf-sandbox

DRAKVUF Sandbox - automated hypervisor-level malware analysis system
https://drakvuf-sandbox.readthedocs.io/
Other
1k stars 140 forks source link

Refactoring: Move postprocessing from drakcore to drakrun #901

Closed psrok1 closed 1 week ago

psrok1 commented 1 month ago

This PR integrates postprocessing with drakrun to perform post-processing in the DrakrunKarton instead of separate service.

Motivation

Most of the "postprocessing" in Drakvuf Sandbox is just splitting information from drakrun.log containing the Drakvuf output. drak-postprocess makes indexes and separate files for data coming from various plugins to make logs easier to browse without reading whole drakrun.log at once. It also tries to build helper files like "wireshark_key_file.txt", "process_tree.json" and "graph.dot", the last one contains result of procmon2dot if it is installed under very specific path.

To do this, all these data are uploaded back and forth to S3 object storage and the processing logic that determines the final structure of analysis artifacts is splitted across two services. In the same time, the "postprocessing" is not very time-consuming so we don't get any significant advantage from making it by separate worker. If user wants to make some actual enrichment of artifacts that may require actual processing, they need to use external Karton consumer like we do with karton-yaramatcher and karton-config-extractor.

What was done in this PR?

What was not done in this PR?

So the funny thing is that drakcore.process.AnalysisProcessor is still there. Why? Because drakcore actually needs a worker to keep track on analysis to put the metadata in the internal database after it is finished.

S3 storage isn't good database as it's meant to work with objects. Listing of N latest objects and gathering the metadata from metadata.json object isn't something we expect to be well-supported by the S3 storage.

If we want to keep having a listing of pending/finished analyses, drakcore probably needs a proper database and worker that will persist the status of analysis. Karton tasks are designed to be volatile messages and they vanish after they're processed. Drakrun may send special tasks that will notify the drakcore about evaluated execution parameters at the beginning and effects of analysis at the end. But this worker should not be placed on the critical path of the processing pipeline.

msm-cert commented 4 weeks ago

If someone really wants raw artifacts e.g. during debugging or other manual Drakrun interaction using drakstart, they can disable postprocess option and run drakpostprocess separately.

May be useful for raw capa related analyses (they run on raw drakrun.log, but they also run as a part of postprocessing)

Some postprocessing like crop_dumps and compress_ipt was already done by the drakrun 😄 I have integrated these actions into the postprocessing engine

nice!

The main difference is that we work directly on the analysis files on the local filesystem, not S3 objects making S3 calls

also nice.

Because drakcore actually needs a worker to keep track on analysis to put the metadata in the internal database after it is finished.

Maybe next time!