Open KGoldsmith11 opened 6 months ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Currently, the
UnivariateDriftCalculator
object supports serialization via pickle. However, this format is not compatible with Apache Spark, which I intend to use for processing. on the inference side.For governance reasons, I need to fit the drift calculator object in a different machine to the one where I will perform inference and have access to the analysis chunks, and the machine performing inference uses spark. Therefore I need to fit the object, serialise it, move it to the inference machine, load it in pyspark and then calculate the data drift on the inference data. This is not working using pickle but spark does have json load methods (spark doesnt have pickle loading methods).
JSON would be a good alternative to pickle as there are json load methods in spark.