Open cswartzvi opened 1 month ago
I'll let Elijah give you feedback. Just wanted to say that the progress bar looks awesome!
@elijahbenizzy while reviewing this code, we can think about how this could help with caching. The NodeGroupPurpose
seems to match the NodeRoleInTaskExecution
(used here) and the spawning parent would help determine the role of dependencies when creating the cache key (here)
Nice! That progress bar is beautiful. I think that, on the surface, these are quite reasonable, and I'm happy to move towards merging these! Will list a set of requirements we'll want to make this into production.
The way to think about hooks is a contract for the executor to follow. In this case, we only have one general execution system (the dynamic one), as we update (E.G. add async support for parallelism, etc...), we'll have more. So as long as the parameters make sense, or have reasonable default values, they're fine to add!
Looking at your code you've done it correctly -- added both layers (internal), and added parameters in the backwards compatible way. The other thing you did well is not pass in the list of task specs to the generator task, which still could be an iterator (if we want to get fancy and do lazy execution).
I think the fact that the code was so easy to write/simple indicates the abstractions are right (for this problem at least).
Note you could add node-level if you're not using Ray. You could also have it say which task you're running now
Anyway, open a PR! Things for getting this to production:
post_task_group
is calledrich
and tqdm
) and group them? pip install sf-hamilton[progress_bars]
or something like thatShould be easy to get out. Really appreciate this! And yes @zilto I think we could simplify the caching code with this.
Is your feature request related to a problem? Please describe.
I hit a bit of a snag while creating some custom multi-level progress bar lifecycle adapters for task-based parallel DAGs (with
rich
for the curious). Currently, for task-based DAGs,TaskExecutionHook
will only fire before and after a task is executed. The hooks have no knowledge of the overall task landscape, including:Note: Item 1 was originally discussed on Slack: https://hamilton-opensource.slack.com/archives/C03MANME6G5/p1728403433108319
Describe the solution you'd like
After speaking with @elijahbenizzy, an initial implementation for item 1 was suggested that modifies the
TaskImplementation
object to store the current task index and the total number of tasks. This information would then be wired through various methods in theExecutionState
class and be eventually passed to the lifecycle hooksrun_after_task_execution
andrun_before_task_execution
onTaskExecutionHook
.While implementing the above in a test branch (https://github.com/cswartzvi/hamilton/tree/update_task_execution_hook) I found that it was still difficult to create a multi-level progress bar without some of the information in item 2-5. To that end I also added:
spawning_task_id
andpurpose
to the methods and hooks associated withTaskExecutionHook
post_task_group
that runs after the tasks are groupedpost_task_expand
that runs after the expander task is parameterizedWith these additional changes (also in the branch above) I was able to create my coveted multi-level progress bar:
Maybe I reached a little too far with this for my own selfish goals :smile:, either way please let me know if you would be interested in a PR for any, or all, of the changes to the task lifecycle adapters (heck, I would also be willing to add rich plugins if you like that as well). Thanks!
Additional context
Currently, the build-in lifecycle adapter
ProgressBar
has an indeterminate length for task-based DAGs.