flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.72k stars 646 forks source link

[Docs] improve docs on eager workflow remote execution #5288

Open vttrifonov opened 6 months ago

vttrifonov commented 6 months ago

Description

I am very excited to learn about eager workflows! Unfortunately, so far I have not managed to make them work for me outside of sandbox and local.

The documentation focuses mostly on sandbox/local execution and for remote there is just a blurb. Throughout the doc it seems like @eager() is enough to decorate the eager workflows but is seems like for remote workflows @eager(remote=...) needs to be included everywhere (?). I understand (vaguely) why this is, but it does not make for a good looking code... This is not needed for @task and @workflow so the natural expectation is that it should not be needed for @eager as well. In any, case a more complete example with eager sub-workflows with remote in mind will be nice.

The second issue I ran into is that even when remote is set it turns out I also need 'secret_group' and 'secret_key' (otherwise an AssertionError: secret_group must be defined when using a remote cluster pops). This is where my attempts for using this feature have stalled. The documentation makes it sound like the person who writes the workflows (me) is the same as person who sets up the cluster (a very busy team of devops in our case). Other than toy examples, this is not true in general. For example, the doc says

Where config.yaml contains a flytectl-compatible config file and my_client_secret_group and my_client_secret_key are the secret group and key that you’ve configured for your Flyte cluster to authenticate via a client key.

When I read this initially I thought that I need to setup the cluster somehow (it sounds like my_client..., I've configured, my cluster etc) but then upon reading the secrets and authentication page (this should be linked in the eager workflow doc) it became more clear this is something that perhaps devops needs to do and then I get it from them. Not sure yet... I do not know much about setting up Kubernetes clusters but overall the impression is that to run @eager now I also need help from devops to intervene somehow.

Overall, @eager is a great feature that needs to crystalize further. I am hoping my comments here will help a bit. Ideally, the remote executing code should look (mostly) like the local one. In the meantime, a bit better documentation (for non-devops people) for what it takes to setup and run with remote will be helpful.

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

welcome[bot] commented 6 months ago

Thank you for opening your first issue here! 🛠

runllm[bot] commented 6 months ago

Thank you for your detailed feedback on the documentation for eager workflows in Flyte, especially regarding remote execution. Your points highlight important areas for improvement in clarity and usability for users who are not directly involved in cluster management.

  1. Remote Execution Decorator: You're correct that for remote execution of eager workflows, the @eager(remote=...) decorator is necessary. This is different from @task and @workflow, which might indeed lead to confusion. A clearer explanation and justification for this requirement in the documentation could help align user expectations.

  2. Secrets Requirement: The need for secret_group and secret_key in remote configurations is a crucial point that should be better addressed in the documentation. It's important to clarify the roles and responsibilities between developers and DevOps, ensuring that developers understand what is required from them and what needs to be set up by the DevOps team. This separation of concerns isn't currently well-defined in the documentation.

  3. Documentation Links and Clarity: Including direct links to relevant sections like secrets management and authentication within the eager workflow documentation can provide a smoother learning curve and reduce confusion.

  4. Examples and Templates: Providing complete examples that include eager sub-workflows configured for remote execution could serve as valuable templates for users to adapt to their needs.

Your suggestions for making the remote execution code resemble local execution as closely as possible are insightful. Enhancing the documentation to better cater to non-DevOps personnel will indeed make the feature more accessible and easier to adopt.

Would you like more detailed guidance on how to configure and use eager workflows remotely, including code examples and configuration details? This could potentially help you move forward with your implementation while the documentation is being improved.

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

kumare3 commented 6 months ago

@vttrifonov thank you for the feedback, we are aware, we have currently not dedicated enough resources to eager workflows. We will get to them soon, later in this year Q3ish

If not would love your help. There are few major issues

  1. UI support - eager cannot be visualized well in ui
  2. Auth delegation. Currently it is not possible to delegate auth to the running container
  3. failure-recovery / state-saving support that is lightweight. currently it will indeed recover from failures, but that involves consulting with remote. We have ideas of how to make it fast with a local state checkpoint