Open kumare3 opened 3 years ago
This can be a good issue for beginners who understand spark, TFrecords and Flyte schemas
Hello π, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! π
Hello π, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! π
Hello π, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! π
Why would this plugin be helpful to the Flyte community Often times users want to process data using Spark, but data is passed to a Tensorflow training process. Parquet or other columnar structures are highly in-efficient for training. To solve this problem, the TF community has done some work. It would be wonderful, if we could perform this conversion automatically depending on the context.
e.g. If the user accepts a TFRecord (data format) as a spark dataframe then we can convert, similarly if the User writes Spark dataframe, but somehow annotates it as TFRecord then we can auto-convert. Similarly, if the user reads the SparkDataframe into a process as TFRecords, we can do the conversion
This library provides this trait https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-connector LinkedIn has further updated this library to make it better in some ways https://github.com/linkedin/spark-tfrecord
Type of Plugin
Can you help us with the implementation?