we are deploying monolith on our environment.
we manage our data by pyspark. So usually we have a pyspark dataframe as data input.
In demos, monolith can load data from tdfs or kafka.
I was wondering that can monolith surpport loading data from pyspark dataframe or hdfs dir?
Or we have to dump files from pyspark to local memory to let monolith load it?
tf.io.gfile can read hdfs. You can either convert your files to tfrecord format or write a custom tensorflow dataset kernel to read data in your format.
we are deploying monolith on our environment. we manage our data by pyspark. So usually we have a pyspark dataframe as data input. In demos, monolith can load data from tdfs or kafka. I was wondering that can monolith surpport loading data from pyspark dataframe or hdfs dir? Or we have to dump files from pyspark to local memory to let monolith load it?