Closed psrok1 closed 1 week ago
If someone really wants raw artifacts e.g. during debugging or other manual Drakrun interaction using drakstart, they can disable postprocess option and run drakpostprocess separately.
May be useful for raw capa related analyses (they run on raw drakrun.log, but they also run as a part of postprocessing)
Some postprocessing like crop_dumps and compress_ipt was already done by the drakrun 😄 I have integrated these actions into the postprocessing engine
nice!
The main difference is that we work directly on the analysis files on the local filesystem, not S3 objects making S3 calls
also nice.
Because drakcore actually needs a worker to keep track on analysis to put the metadata in the internal database after it is finished.
Maybe next time!
This PR integrates postprocessing with drakrun to perform post-processing in the DrakrunKarton instead of separate service.
Motivation
Most of the "postprocessing" in Drakvuf Sandbox is just splitting information from
drakrun.log
containing the Drakvuf output.drak-postprocess
makes indexes and separate files for data coming from various plugins to make logs easier to browse without reading wholedrakrun.log
at once. It also tries to build helper files like "wireshark_key_file.txt", "process_tree.json" and "graph.dot", the last one contains result of procmon2dot if it is installed under very specific path.To do this, all these data are uploaded back and forth to S3 object storage and the processing logic that determines the final structure of analysis artifacts is splitted across two services. In the same time, the "postprocessing" is not very time-consuming so we don't get any significant advantage from making it by separate worker. If user wants to make some actual enrichment of artifacts that may require actual processing, they need to use external Karton consumer like we do with
karton-yaramatcher
andkarton-config-extractor
.What was done in this PR?
drakrun.postprocess
withpostprocess_analysis
function that gets analysis path and launches "postprocessing plugins" on that path, keeping it similar to the original logic. The main difference is that we work directly on the analysis files on the local filesystem, not S3 objects making S3 callsdrakrun.analyzer.analyze_sample
). If someone really wants raw artifacts e.g. during debugging or other manual Drakrun interaction usingdrakstart
, they can disablepostprocess
option and rundrakpostprocess
separately.crop_dumps
andcompress_ipt
was already done by the drakrun :smile: I have integrated these actions into the postprocessing engine.What was not done in this PR?
So the funny thing is that
drakcore.process.AnalysisProcessor
is still there. Why? Because drakcore actually needs a worker to keep track on analysis to put the metadata in the internal database after it is finished.S3 storage isn't good database as it's meant to work with objects. Listing of N latest objects and gathering the metadata from
metadata.json
object isn't something we expect to be well-supported by the S3 storage.If we want to keep having a listing of pending/finished analyses, drakcore probably needs a proper database and worker that will persist the status of analysis. Karton tasks are designed to be volatile messages and they vanish after they're processed. Drakrun may send special tasks that will notify the drakcore about evaluated execution parameters at the beginning and effects of analysis at the end. But this worker should not be placed on the critical path of the processing pipeline.