Open gjoseph92 opened 2 years ago
Shouldn't this be better addressed in the graph construction layer, e.g. why can optimize_blockwise
not set "sane" key names? Would this be something users interact with directly?
Indeed it should be dealt with properly by graph construction, and optimize_blockwise
should be responsible for setting "sane" key names.
However, the design of HLGs makes this very difficult. See https://github.com/dask/dask/issues/8635 for details, but the gist is: if you rename a layer, you then need to change the key-names that all of its dependents are referencing. There isn't an API on Layer
s to be able to do this generically—if the dependent of a Blockwise
is a low-level dict layer, it's easy, but what if it's a CustomUserLayer
? Hence why leaving the layer names unchanged is kind of the only approach optimize_blockwise
can take right now without adding new APIs.
So I proposed this as a pragmatic workaround to improve UX (and possibly scheduling/stealing a little). I also don't like having to make even more concessions on the scheduler for problems with HLGs. I'd be fine not doing it and saying it needs to fixed at the HLG level, I just doubt it'll be fixed anytime soon in that case. Which is okay, it's not that big of a deal.
Allowing tasks to specify a custom TaskGroup and TaskPrefix key—via annotations—would give us an easy path to work around the fact that Blockwise fusion produces task names that aren't meaningful to users, and can be confusing: https://github.com/dask/dask/issues/8635.
In
update_graph
:task-group-key
annotation is set for a task, support passing it in as the group key instead of callingkey_split_group
on the task name.task-prefix-key
annotation is set for a task, use its value as the prefix key instead of callingkey_split
on the task name.task-group-key
is given, but nottask-prefix-key
, calculate the prefix key fromtask-group-key
(this would require a new function likekey_split_from_group
). Not really necessary, just nice to not have to pass the semi-redundant annotation.In
update_graph
, if the TaskState is already in memory, thetask-group-key
andtask-prefix-key
annotations can be ignored, whether or not they match the group/prefix already associated with that task.Once this is implemented, then we can have
optimize_blockwise
set these annotations accordingly.