Instead of making extra mappings, we can simply generate task uids in format {root_uid}:uid. Most tasks in the Karton environment are "produced" by karton-system while routing, so performance gain will be significant even if most consumers are not upgraded yet to the latest version of Karton.
The only disadvantage is that task identifiers will be a bit lengthy e.g. {57d3630e-c86d-4931-b071-d1fafb02610d}:f03f9927-dd6a-4275-8706-f922efb7a331. This may have some impact on the size of logs and Redis memory usage
On the other hand, having root_uid in logs is pretty useful to track issues on analysis level and correlate logs with actual samples, so this side-effect is an advantage as well.
To actually get a performance gain from that change, other changes in karton.inspect and karton.backend have been included as well:
New KartonBackend method iter_task_tree that allows to gather all tasks related with provided root_uid. It also lists tasks with legacy uids (matching karton.task:*[^:]*) that should be mostly unrouted tasks waiting for karton-system processing.
KartonState is now lazy and gathers information about all tasks in system only when one of KartonState properties has been used. In addition, there is extra method get_analysis that allows to get tasks coming only from the specified root_uid.
KartonState has turned off resource parsing by default (parse_resources=False). Converting __karton_resource__ payload entries to actual Resource objects is unnecessary for dashboard and analysis status checking. In the same time, it enables much faster deserialization of tasks.
Changes planned after merging this PR:
Convert {root_uid}:uid back to uid in karton-dashboard views to make its views less bloated with long UIDs
Use KartonState.get_analysis in MWDB to make analysis status gathering much quicker.
This PR is follow-up after https://github.com/CERT-Polska/karton/pull/207
Instead of making extra mappings, we can simply generate task uids in format
{root_uid}:uid
. Most tasks in the Karton environment are "produced" by karton-system while routing, so performance gain will be significant even if most consumers are not upgraded yet to the latest version of Karton.The only disadvantage is that task identifiers will be a bit lengthy e.g.
{57d3630e-c86d-4931-b071-d1fafb02610d}:f03f9927-dd6a-4275-8706-f922efb7a331
. This may have some impact on the size of logs and Redis memory usageOn the other hand, having root_uid in logs is pretty useful to track issues on analysis level and correlate logs with actual samples, so this side-effect is an advantage as well.
To actually get a performance gain from that change, other changes in
karton.inspect
andkarton.backend
have been included as well:iter_task_tree
that allows to gather all tasks related with providedroot_uid
. It also lists tasks with legacy uids (matchingkarton.task:*[^:]*
) that should be mostly unrouted tasks waiting forkarton-system
processing.KartonState
is now lazy and gathers information about all tasks in system only when one ofKartonState
properties has been used. In addition, there is extra methodget_analysis
that allows to get tasks coming only from the specified root_uid.KartonState
has turned off resource parsing by default (parse_resources=False
). Converting__karton_resource__
payload entries to actual Resource objects is unnecessary for dashboard and analysis status checking. In the same time, it enables much faster deserialization of tasks.Changes planned after merging this PR:
{root_uid}:uid
back touid
in karton-dashboard views to make its views less bloated with long UIDsKartonState.get_analysis
in MWDB to make analysis status gathering much quicker.