TopEFT / topeft

15 stars 24 forks source link

Tool for TopEFT workflow visualization #304

Closed andrewhennessee closed 2 years ago

andrewhennessee commented 2 years ago

Original PR #280

This tool parses through the task accumulation log (tasks.log) for a topEFT run and generates a visualization of the workflow. An example is attached below:

workflow-graph.pdf

Using a Graphviz python library, the program draws edges between processing and accumulating nodes (ellipses) with intermediate nodes for output files (boxes). For now, we exclude preprocessing tasks from the visualization.

Each processing/accumulating node contains information from the log on task ID, category (the lowercase letter next to the ID indicating "processing" or "accumulating"), CPU time, wall time, memory (in MB), and the range of events. The colors are adjusted on CPU (relative to the maximum for the run), and the sizes are adjusted on memory (relative to the maximum for the run). Nodes that are more red had higher CPU times. Nodes that are larger used more memory.

Each output file node contains the size of the file. Originally, the task accumulation log did not track the sizes of input and output files for processing and accumulating tasks. I updated the work_queue_tools.py script from coffea to track this information in the log. There is a little debugging to be done with tracking the input files (not used by this tool), so I'm in the process of writing a PR to propose these changes.

After parsing through the log, this tool first generates a .gv file that contains the structure/formatting of the graph in DOT syntax. The .gv file is processed using the dot to produce a pdf of the visualization. The tool can be run using this command: ./topEFT_workflow_viz.py tasks.log.

This tool will be useful in identifying potential bottlenecks within a topEFT run. Users can pinpoint tasks that are performing sub-optimally based on resources consumed. The visualization also allows users to make important decisions about which tasks should be run locally and which should be remote. Run with larger workflows, the visualization becomes more interesting and more useful.

kmohrman commented 2 years ago

Thanks @andrewhennessee, this looks good from my end (and looks like it's passing all of the relevant CI, so that's good as well). @btovar, I'm wondering if you had any additional comments on this PR or if you are also happy for it to be merged at this point?

btovar commented 2 years ago

Looks good!

kmohrman commented 2 years ago

Ok, sounds good! Since @bryates has also approved the PR (the original one), I'll go ahead and merge now.