Open benclifford opened 4 years ago
So for each task in a workflow, if its task_hashsum
is not None
, we want to traceback to previous runs for its info (e.g., status and ).
I could imagine situations that miss some info, for example, if a user removes the monitoring db but does not remove runinfo
(so it has checkpoints but does not have any records in monitoring db).
Hi @ZhuozhaoLi Yes, the user could remove monitoring.db without removing runinfo and that would be bad. Would it make sense to store monitoring.db inside the runinfo/ directory? That way there would be a better chance of all-or-nothing.
Is your feature request related to a problem? Please describe. Sometimes a parsl python program is run several times without change with checkpointing turned on to drive towards a final completed set of outputs.
The present visualization code doesn't give much in the way of seeing what happened: it will show each run as a separate workflow run which can be visualised, but can't show anything more integrated.
Describe the solution you'd like There are different things that could happen here. In many of the time based plots, it might make sense to concatenate the graphs for each of a series of runs along the x-axis.
@tomglanzman has experimented a bit with using the
task_hashsum
field to tie information together about invocations about apps across multiple runs - for example, given a cached app invocation in the current run, go back through previous runs to find the original execution information. This addresses questions like "give me a histogram of the execution time of the runs of each command", discarding failed attempts and memoized attempts which take almost no time.Additional context This is in the context of visualizing large runs for LSST DESC DM work.