CERT-Polska / karton

Distributed malware processing framework based on Python, Redis and S3.
https://karton-core.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
381 stars 45 forks source link

Task tree: backend performance improvements #255

Closed psrok1 closed 3 months ago

psrok1 commented 3 months ago

This PR is follow-up after https://github.com/CERT-Polska/karton/pull/207

Instead of making extra mappings, we can simply generate task uids in format {root_uid}:uid. Most tasks in the Karton environment are "produced" by karton-system while routing, so performance gain will be significant even if most consumers are not upgraded yet to the latest version of Karton.

The only disadvantage is that task identifiers will be a bit lengthy e.g. {57d3630e-c86d-4931-b071-d1fafb02610d}:f03f9927-dd6a-4275-8706-f922efb7a331. This may have some impact on the size of logs and Redis memory usage

[2024-05-14 12:38:06,884][INFO] Received new task - {57d3630e-c86d-4931-b071-d1fafb02610d}:f03f9927-dd6a-4275-8706-f922efb7a331
{"error": null, "headers": {"origin": "", "quality": "high", "receiver": "karton.wait-for-it", "type": "task"}, "headers_persistent": {"quality": "high"}, "last_update": 1715683086.8855345, "orig_uid": "{57d3630e-c86d-4931-b071-d1fafb02610d}:57d3630e-c86d-4931-b071-d1fafb02610d", "parent_uid": null, "payload": {"payload": {"echo": "off"}}, "payload_persistent": {"__headers_persistent": {"quality": "high"}}, "priority": "normal", "root_uid": "57d3630e-c86d-4931-b071-d1fafb02610d", "status": "Started", "uid": "{57d3630e-c86d-4931-b071-d1fafb02610d}:f03f9927-dd6a-4275-8706-f922efb7a331"}
[2024-05-14 12:38:16,886][INFO] Task done - {57d3630e-c86d-4931-b071-d1fafb02610d}:f03f9927-dd6a-4275-8706-f922efb7a331
[2024-05-14 12:38:16,892][INFO] Received new task - {3d41423a-a99b-43ba-8e89-6c9dcfd71a50}:cbf7c5ec-97da-446d-b65f-1d72997ffada

On the other hand, having root_uid in logs is pretty useful to track issues on analysis level and correlate logs with actual samples, so this side-effect is an advantage as well.

To actually get a performance gain from that change, other changes in karton.inspect and karton.backend have been included as well:

Changes planned after merging this PR: