Open pingsutw opened 2 years ago
the only thing is that it seems skyplane needs to launch a machine
we can explore this more this quarter as we work on the data story but i think arrowfs should already be a lot better. but i'll spend some time playing around with this after that work is done.
Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏
Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏
Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! 🙏
Hi, can I work on this issue? @davidmirror-ops
@Nupoor10 assigned. So, building a flytekit plugin is a significant effort and I think you'll have questions, so please drop them here or head to the hacktoberfest channel please. Thank you!
Hi, @davidmirror-ops I tried to join the slack channel but I am getting this message: Contact the workspace administrator at Flyte for an invitation.
@Nupoor10 sorry, I think you have to join the workspace first (no need for invite)
Hi, I will not be able to work on this issue due to the time crunch. I am unassigning myself so that other contributors can work on this. Thanks for the opportunity!
@Nupoor10 thanks for taking the time and letting us know!
I’d like to take up this issue to develop the Skyplane Flytekit plugin. #take
Hi,
I have submitted this PR as an initial implementation of the Skyplane plugin integration. At this stage, I’ve structured the plugin, created the SkyplaneJob and SkyplaneFunctionTask classes, and registered the plugin with Flyte.
I’d greatly appreciate any feedback or guidance to ensure that I’m on the right track with this implementation, especially regarding the integration with Skyplane’s data transfer features. I’ll continue to refine the code and address any changes based on the team’s suggestions.
And, in next iteration, I plan to implement error handling and logging mechanisms to provide better feedback during data transfer operations. Are there any established best practices or patterns for error handling and logging in the context of Flyte plugins that I should consider?
And as mentioned in some old comments, about launching machines and apache arrow based file system (I believe that’s what arrow FS means there, please let me know if I am correct), how can I go about it?
Looking forward to your insights!
Motivation: Why do you think this is important?
SkyPlane make transferring data much faster and cheaper. Currently, we use awscli to upload/download data from s3, and awscli doesn't have a good performance on I/O. However, We can probably reduce overhead on I/O if we replace awscli with Skyplane.
We could add a Skyplane flytekit plugin first, and extend DataPersistence to implement a new persistence plugin. Finally, we should test it to see how much time we can save on I/O.
Goal: What should the final outcome look like, ideally?
Use Skyplane to upload / download the flyte literal by default if people install the Skyplane plugin.
Describe alternatives you've considered
Use awscli, which is what we already have now.
Propose: Link/Inline OR Additional context
Skyplane: 110x faster data transfers on any cloud
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?