Open EngHabu opened 9 months ago
š¤
Hello @EngHabu!
I'm Dosu, a friendly bot here to assist you while we wait for a human maintainer to chime in. I'm here to help you solve bugs, answer questions, and navigate your way into becoming a contributor. Just remember, I'm a bot, but I'm here to make your interaction with this repository as smooth as possible.
Let's get cracking on this!
Thank you for your detailed request. I agree that passing additional metadata in inputs/outputs of tasks/workflows can be a valuable feature for Flyte.
As you've mentioned, this is a complex change due to the system's assumption that Inputs and Outputs are always LiteralMaps and the need for backward compatibility.
To implement this feature, we can extend the existing protobuf definitions to include additional fields for metadata. For example, in the TaskMetadata
struct, we could add a new field for metadata:
type TaskMetadata struct {
// existing fields...
Metadata map[string]string `protobuf:"bytes,11,rep,name=metadata,proto3" json:"metadata,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"`
}
This would allow us to pass arbitrary key-value pairs as metadata. The same can be done for workflows.
To ensure backward compatibility, we would need to make sure that any code that reads these protobuf messages can handle the case where the new Metadata
field is not present. This can be done by checking if the field is nil
before trying to access it:
func (m *TaskMetadata) GetMetadata() map[string]string {
if m != nil {
return m.Metadata
}
return nil
}
For a smooth rollout, we could first deploy a version of your code that can handle the new Metadata
field but does not yet produce it. Once all systems are running the new code, we can start producing messages with the new field.
Please note that this is a high-level overview of the changes that would be needed. The actual implementation might require additional changes depending on the specifics of your system.
I hope this helps! If you have any further questions or need clarification on any points, please don't hesitate to ask.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a š if this is helpful and š if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Motivation: Why do you think this is important?
There are a few instances we faced in the past where we wanted to pass additional contextual information about the inputs/outputs of a task that are not themselves inputs/outputs (The most recent example is flyte decks). Unfortunately the way we had to pass that is by convention. The engine (i.e. Propeller) assumes a task writes decks, checks for the file existence and bails if it's not there. Conventions are not very strong :-) and limit the optimizations the system can make. Another instance of this is executing, generating and exporting input validation report from Great Expectations or Pandera. This is not really an output but rather metadata about the input/output dataframes.
Goal: What should the final outcome look like, ideally?
This is a complex change and its complexity stems from a few aspects:
Here is a proposed breakdown of the work particularly around the component boundaries:
flyteidl
and all backend code. This code should default to reading the newInputData
andOutputData
and fallback toLiteralMap
if unmarshaling fails, this will ensure oldpropeller
can continue to work with a neweradmin
(but not vis versa)flytepropeller
writesinputs.pb
for a task, it should gate that behavior based on the version of flytekit. If it's not aflytekit
runtime or it's an old version, it should default to writing the old format until other SDKs are updated.flytekit
to do the same.flyteconsole
to do the same.The rollout plan is:
As long as flyteadmin is updated first, the system assumptions will hold.
Describe alternatives you've considered
Continue to use conventions to pass data... not good
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?