bytedance / monolith

ByteDance's Recommendation System
Other
879 stars 125 forks source link

can monolith load data from hdfs? #10

Open colinlzh opened 1 year ago

colinlzh commented 1 year ago

we are deploying monolith on our environment. we manage our data by pyspark. So usually we have a pyspark dataframe as data input. In demos, monolith can load data from tdfs or kafka. I was wondering that can monolith surpport loading data from pyspark dataframe or hdfs dir? Or we have to dump files from pyspark to local memory to let monolith load it?

hanzhi713 commented 1 year ago

tf.io.gfile can read hdfs. You can either convert your files to tfrecord format or write a custom tensorflow dataset kernel to read data in your format.