Open JackUrb opened 3 months ago
The issue you're encountering is related to Flyte's handling of complex types, particularly nested dataclasses. According to this GitHub issue, nested dataclasses are not fully supported in Flyte's type system, which can lead to type mismatches during workflow compilation. This limitation forces users to create numerous wrappers for inputs and outputs, complicating the workflow.
Would you like more detail on potential workarounds or further explanation of the issue?
Sources:
This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.
Mention @runllm to ask follow-up questions. See runllm.com for more!
@JackUrb you are talking about coalescing inputs and reconstructing a json (dataclass internally) from multiple dataclasses. Today flyte does support exploding a dataclass into individual attributes in the engine (hence you can do so), but not combining them. This is more tricky as Flytepropeller does not know the structure of the json to create. What we need is a json template that can be assembled.
We are discussing this internally, would love to see community support. would you be open to contributing?
I'm 100% interested in contributing here. I've been following along on the JSON IDL discussions, but that seems to be just the first step towards something like this being that it creates a fixed scalar rather than a collection type.
Hi @kumare3 - revisiting this now that the upcoming 1.14 release includes the message pack binary from https://github.com/flyteorg/flyte/pull/5742, and is thus a step closer to this functionality.
I imagine the next step forwards from here would be something that includes both the required typing for promises that are included in a dataclass-with-promises as well as an indicator for the FlytePropeller backend that can list the required promises that need to be resolved before the dataclass can be resolved.
On the flytekit
local side, it ultimately ends up as simple as extending the recursive resolver (here: https://github.com/flyteorg/flytekit/pull/2828) to support dataclass as something to expand (as I attempted on an older fork here), but I don't necessarily have a deep enough understanding on where to go poking to replicate the same behavior on the propeller side.
Motivation: Why do you think this is important?
We pass around most of our configuration in flyte via python
@dataclass
as this makes it very easy for us to manage configuration and such. Unfortunately, this also means that we have to write a ton of wrappers for the inputs + outputs of our flyte@task
and@workflow
's, as we are often dereferencing attributes to pass along to other inputs, but need to wrap them (inlist
s and otherdataclass
es). Something like:doesn't end up working, as we can't use the attributes of a promise dataclass nested in containers as inputs.
Instead, we're forced to do things like:
While this doesn't add much to the above toy example, when the dataclasses have many more fields and depth, this gets messy fast. Making changes to a dataclass requires us to update all wrappers for it, the actual visualized graph becomes bloated with wrappers, and ease-of-use goes down.
It's also an issue when trying to wrap a dataclass promise's attribute in a container, such as:
Goal: What should the final outcome look like, ideally?
Overall I imagine the approach is to increase the level of support for Dataclass attributes across flytekit, doing more to:
some_task([task_output.x])
a. Importantly, retain the typing as defined by the dataclass, even when only receiving an attribute (issue for complex types and forint
s)I think this would also resolve issues like https://github.com/flyteorg/flyte/issues/5427
Describe alternatives you've considered
We could use try
@eager
execution mode for this, but that seems to change a lot of the semantics of flyte as well as not have widespread support just yet. Plus, it would require significant refactors throughout the codebase.Propose: Link/Inline OR Additional context
I've taken first stabs at pieces of this:
promise.translate_inputs_to_literals
andbase_task.local_execute
, I think it probably has a significant way to go, but I don't know what I don't know. It's also possible that https://github.com/flyteorg/flytekit/pull/2600 completely resolves this part of my issue, but I'd need to dig more there.binding_data_from_python_std
to iterate over dataclasses that contain promises and resolve them using aBindingDataMap
rather than aBindingData
scalar. This almost works, but if thePromise
's types are not primitive then duringto_literal
during serialization we may not have access to the non-primitive (in a similar issue to what the above attempts to resolve, and what likely causes https://github.com/flyteorg/flyte/issues/5427).Overall I think this is a huge quality-of-life win for using Flyte with dataclasses, and I'm happy to actually work out the implementation, but I feel like I'm missing some pieces and context on the approach.
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?