Open wild-endeavor opened 2 years ago
Hello π, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! π
Hello π, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! π
Hello π, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! π
Goal
Correctly identify when in the
FlyteRemote
experience, where theSerializationSettings
object is required and update the code so that the user's experience is such that they only have to specify it when it's actually needed.Background
Flyte Remote
FlyteRemote
is the object defined in this file. It provides programmatic control plane access to Flyte with a better ux than using the raw protobuf generated clients. This object is meant for users who want to talk to Flyte Admin (the Flyte control plane) but do not necessarily have workflow/task code checked out locally. (To clarify, flytekit of course knows how to parse user declared@task
and@workflow
user-code. But this portion of flytekit is independent of that, it does not need to interact with user code directly, though it can.)Translator and
SerializationSettings
Flyte is based on Flyte IDL - these are the protobuf messages that define everything Flyte, from what a Task is, to the parallelism to run inside a workflow, to plugins. Flytekit is a Python SDK that understands Python objects. The translator.py is the piece of code that links the two.
Portability and settings
Some time ago, we realized something. There are broadly two types of user provided "settings". One set are things like the k8s service account or aws IAM role to use, the notifications to send when a workflow succeeds, or the S3 or GCS bucket to write your dataframe to - these settings don't have to deal with the tasks directly. They're more an artifact of how you want them run. Two people can run the same task differently. On the other hand you have settings like a task's container image. These are non-changing, they define and are part of the task itself. The former are encapsulated in the Options object and the latter are encapsulated in the
SerializationSettings
object.Complicating things a bit maybe is the fact that some of these are mixed (project/domain/version) which exists in the serialization object but they can be overridden (like when you register with flytectl), you can always pick a different project. The way this works is that when serializing, we use a placeholder that flytectl (or flytekit) then replaces.
Remote and local tasks
Complicating things further is the fact that FlyteRemote works with both local and remote tasks. What are these? A 'local' task (not a formal term in Flyte parlance) is a Python object built from an
@task
decorated function (or something like that - there are flytekit tasks that don't rely on functions but you get the idea... a local task is a task for which you have the code checked out and accessible to that Python process). A remote task is aFlyteTask
... it's what gets created when you callmy_remote.fetch_task()
. It's a Python object that encapsulates the Flyte backend's understanding of the task, but you don't have the code checked out locally.Currently in Flytekit, you can use a remote object to
fetch
a couple tasks, and then make a new workflow from it, and then register that workflow... in this scenario, you shouldn't need theSerializationSettings
- why? Because the tasks are already defined, and the creation of the workflow (the dag basically) does not depend on it.Details.
This line here would be a good place to start. This is one (the only??) example in the UX, where the user is being asked to provide the serialization object even in cases when it might not be needed. This is the main ask for this issue, go through the FlyteRemote APIs, and delay the asking for the
SerializationSettings
object until it is needed. We can raise an exception when this is required but not provided.Since we've already committed to this API we can't really change some of the function signatures so you'll need to be cognizant of that.
Misc
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?