flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.81k stars 660 forks source link

[Core feature] Skyplane flytekit plugin #3005

Open pingsutw opened 2 years ago

pingsutw commented 2 years ago

Motivation: Why do you think this is important?

SkyPlane make transferring data much faster and cheaper. Currently, we use awscli to upload/download data from s3, and awscli doesn't have a good performance on I/O. However, We can probably reduce overhead on I/O if we replace awscli with Skyplane.

We could add a Skyplane flytekit plugin first, and extend DataPersistence to implement a new persistence plugin. Finally, we should test it to see how much time we can save on I/O.

Goal: What should the final outcome look like, ideally?

Use Skyplane to upload / download the flyte literal by default if people install the Skyplane plugin.

Describe alternatives you've considered

Use awscli, which is what we already have now.

Propose: Link/Inline OR Additional context

Skyplane: 110x faster data transfers on any cloud

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

kumare3 commented 2 years ago

the only thing is that it seems skyplane needs to launch a machine

wild-endeavor commented 1 year ago

we can explore this more this quarter as we work on the data story but i think arrowfs should already be a lot better. but i'll spend some time playing around with this after that work is done.

github-actions[bot] commented 1 year ago

Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏

github-actions[bot] commented 1 year ago

Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏

github-actions[bot] commented 4 months ago

Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! 🙏

Nupoor10 commented 1 month ago

Hi, can I work on this issue? @davidmirror-ops

davidmirror-ops commented 1 month ago

@Nupoor10 assigned. So, building a flytekit plugin is a significant effort and I think you'll have questions, so please drop them here or head to the hacktoberfest channel please. Thank you!

Nupoor10 commented 1 month ago

Hi, @davidmirror-ops I tried to join the slack channel but I am getting this message: Contact the workspace administrator at Flyte for an invitation.

davidmirror-ops commented 1 month ago

@Nupoor10 sorry, I think you have to join the workspace first (no need for invite)

Nupoor10 commented 1 month ago

Hi, I will not be able to work on this issue due to the time crunch. I am unassigning myself so that other contributors can work on this. Thanks for the opportunity!

davidmirror-ops commented 1 month ago

@Nupoor10 thanks for taking the time and letting us know!

10sharmashivam commented 1 month ago

I’d like to take up this issue to develop the Skyplane Flytekit plugin. #take

10sharmashivam commented 1 month ago

Hi,

I have submitted this PR as an initial implementation of the Skyplane plugin integration. At this stage, I’ve structured the plugin, created the SkyplaneJob and SkyplaneFunctionTask classes, and registered the plugin with Flyte.

I’d greatly appreciate any feedback or guidance to ensure that I’m on the right track with this implementation, especially regarding the integration with Skyplane’s data transfer features. I’ll continue to refine the code and address any changes based on the team’s suggestions.

And, in next iteration, I plan to implement error handling and logging mechanisms to provide better feedback during data transfer operations. Are there any established best practices or patterns for error handling and logging in the context of Flyte plugins that I should consider?

And as mentioned in some old comments, about launching machines and apache arrow based file system (I believe that’s what arrow FS means there, please let me know if I am correct), how can I go about it?

Looking forward to your insights!