flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.74k stars 651 forks source link

[Plugin][Flytekit] Support for TFRecord as loadable schema type #1144

Open kumare3 opened 3 years ago

kumare3 commented 3 years ago

Why would this plugin be helpful to the Flyte community Often times users want to process data using Spark, but data is passed to a Tensorflow training process. Parquet or other columnar structures are highly in-efficient for training. To solve this problem, the TF community has done some work. It would be wonderful, if we could perform this conversion automatically depending on the context.

e.g. If the user accepts a TFRecord (data format) as a spark dataframe then we can convert, similarly if the User writes Spark dataframe, but somehow annotates it as TFRecord then we can auto-convert. Similarly, if the user reads the SparkDataframe into a process as TFRecords, we can do the conversion

This library provides this trait https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-connector LinkedIn has further updated this library to make it better in some ways https://github.com/linkedin/spark-tfrecord

Type of Plugin

Can you help us with the implementation?

kumare3 commented 3 years ago

This can be a good issue for beginners who understand spark, TFrecords and Flyte schemas

github-actions[bot] commented 1 year ago

Hello πŸ‘‹, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! πŸ™

github-actions[bot] commented 1 year ago

Hello πŸ‘‹, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! πŸ™

github-actions[bot] commented 3 months ago

Hello πŸ‘‹, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! πŸ™