flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.63k stars 624 forks source link

[Core feature] Support parallelizing builds using ImageSpec #4671

Open katrogan opened 9 months ago

katrogan commented 9 months ago

Motivation: Why do you think this is important?

Improved E2E performance for flyte workflow registration and execution

Goal: What should the final outcome look like, ideally?

See title

Describe alternatives you've considered

Current serial model

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

dosubot[bot] commented 9 months ago

🤖

Hey there @katrogan! I'm Dosu, a friendly bot here to lend a hand while we wait for a human maintainer. I'm pretty handy with bugs, answering questions, and helping folks like you become contributors. Just think of me as your digital sidekick in this workflow adventure! 🚀

Thank you for your feature request. Parallelizing builds using ImageSpec could indeed improve the end-to-end performance of Flyte workflow registration and execution.

Currently, the build process in Flyte is structured using a Makefile, which includes steps for compilation, Linux compilation, release automation, Helm chart generation, building Docker images, and tidying Go modules. The build process is executed serially, as you've noted.

Parallelization in the Flyte codebase is currently handled through the use of Go's goroutines, but this is used for system-level tasks like event publishing, not for executing user-defined workflows or tasks in parallel.

The mechanism for handling image builds is through GitHub Actions workflows, specifically the workflow defined in the .github/workflows/single-binary.yml file. This workflow includes jobs for testing, building and pushing images, and running functional tests.

To implement your feature request, changes would likely need to be made to both the Makefile and the GitHub Actions workflow. The Makefile would need to be updated to support parallel execution of tasks, and the GitHub Actions workflow would need to be updated to support parallel building and pushing of images.

However, it's important to note that parallelizing builds could introduce new challenges, such as managing dependencies between tasks and ensuring that resources are efficiently utilized. It would also require thorough testing to ensure that the build process remains reliable and that the resulting images are correctly built.

I hope this provides some insight into how your feature request could be implemented. If you have any further questions or need more information, please let me know. Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.