NannyML / nannyml

nannyml: post-deployment data science in python
https://www.nannyml.com/
Apache License 2.0
1.99k stars 144 forks source link

Make UnivariateDriftCalculator (and other objects) JSON serializable #394

Open KGoldsmith11 opened 6 months ago

KGoldsmith11 commented 6 months ago

Currently, the UnivariateDriftCalculator object supports serialization via pickle. However, this format is not compatible with Apache Spark, which I intend to use for processing. on the inference side.

For governance reasons, I need to fit the drift calculator object in a different machine to the one where I will perform inference and have access to the analysis chunks, and the machine performing inference uses spark. Therefore I need to fit the object, serialise it, move it to the inference machine, load it in pyspark and then calculate the data drift on the inference data. This is not working using pickle but spark does have json load methods (spark doesnt have pickle loading methods).

JSON would be a good alternative to pickle as there are json load methods in spark.

stale[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.