flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.78k stars 659 forks source link

[Core feature] [flytekit] Increase laziness of SerializationSettings #2647

Open wild-endeavor opened 2 years ago

wild-endeavor commented 2 years ago

Goal

Correctly identify when in the FlyteRemote experience, where the SerializationSettings object is required and update the code so that the user's experience is such that they only have to specify it when it's actually needed.

Background

Flyte Remote

FlyteRemote is the object defined in this file. It provides programmatic control plane access to Flyte with a better ux than using the raw protobuf generated clients. This object is meant for users who want to talk to Flyte Admin (the Flyte control plane) but do not necessarily have workflow/task code checked out locally. (To clarify, flytekit of course knows how to parse user declared @task and @workflow user-code. But this portion of flytekit is independent of that, it does not need to interact with user code directly, though it can.)

Translator and SerializationSettings

Flyte is based on Flyte IDL - these are the protobuf messages that define everything Flyte, from what a Task is, to the parallelism to run inside a workflow, to plugins. Flytekit is a Python SDK that understands Python objects. The translator.py is the piece of code that links the two.

Portability and settings

Some time ago, we realized something. There are broadly two types of user provided "settings". One set are things like the k8s service account or aws IAM role to use, the notifications to send when a workflow succeeds, or the S3 or GCS bucket to write your dataframe to - these settings don't have to deal with the tasks directly. They're more an artifact of how you want them run. Two people can run the same task differently. On the other hand you have settings like a task's container image. These are non-changing, they define and are part of the task itself. The former are encapsulated in the Options object and the latter are encapsulated in the SerializationSettings object.

Complicating things a bit maybe is the fact that some of these are mixed (project/domain/version) which exists in the serialization object but they can be overridden (like when you register with flytectl), you can always pick a different project. The way this works is that when serializing, we use a placeholder that flytectl (or flytekit) then replaces.

Remote and local tasks

Complicating things further is the fact that FlyteRemote works with both local and remote tasks. What are these? A 'local' task (not a formal term in Flyte parlance) is a Python object built from an @task decorated function (or something like that - there are flytekit tasks that don't rely on functions but you get the idea... a local task is a task for which you have the code checked out and accessible to that Python process). A remote task is a FlyteTask... it's what gets created when you call my_remote.fetch_task(). It's a Python object that encapsulates the Flyte backend's understanding of the task, but you don't have the code checked out locally.

Currently in Flytekit, you can use a remote object to fetch a couple tasks, and then make a new workflow from it, and then register that workflow... in this scenario, you shouldn't need the SerializationSettings - why? Because the tasks are already defined, and the creation of the workflow (the dag basically) does not depend on it.

Details.

This line here would be a good place to start. This is one (the only??) example in the UX, where the user is being asked to provide the serialization object even in cases when it might not be needed. This is the main ask for this issue, go through the FlyteRemote APIs, and delay the asking for the SerializationSettings object until it is needed. We can raise an exception when this is required but not provided.

Since we've already committed to this API we can't really change some of the function signatures so you'll need to be cognizant of that.

Misc

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

github-actions[bot] commented 1 year ago

Hello πŸ‘‹, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! πŸ™

github-actions[bot] commented 1 year ago

Hello πŸ‘‹, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! πŸ™

github-actions[bot] commented 1 month ago

Hello πŸ‘‹, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! πŸ™