This is a fairly straightforward task that will go a long way towards improving model functionality and maintainability of the code base.
Modules:lifetimes.models.__init__.BaseModel class object
Issue: An ArviZ InferenceData object is created as a model attribute whenever model.fit() is called. Currently model persistence entails extracting model parameters from this attribute and dumping them into a memory-optimized JSON file. However, once this JSON file is loaded into a model, ArviZ plotting and statistical functions are no longer supported. The pre/post-processing code to format this JSON also adds unnecessary complexity to the BaseModel class and could make future maintenance more difficult. Plus let's be honest, this isn't a 350GB NLP model; reducing a <10 MB InferenceData object down to a <4 MB JSON is not worth the hassle.
Work Summary: Replace JSON formatting code in _unload_params() , fit(), save_params() and load_params() with ArViz methods like arviz.InferenceData.to_json() and arviz.from_json().
remove_hypers can also be removed as a model class attribute, and I'm not opposed to renaming save_params() and load_params() to save_model() and load_model() either.
Other Comments: JSON is the preferred format for model persistence. Pickle files have their place for the fast read/writes demanded of online learning and passing objects between CPU threads, but the added complexity of their implementation just isn't worth it for a model that is only saved & loaded one time. They are also a security risk since malware can be obscured in a pickle format. I could totally see a hacker with prior system access overwriting a .pkl model file with an executable that exfiltrates customer IDs whenever the model is ran.
This is a fairly straightforward task that will go a long way towards improving model functionality and maintainability of the code base.
Modules:
lifetimes.models.__init__.BaseModel
class objectIssue: An ArviZ
InferenceData
object is created as a model attribute whenevermodel.fit()
is called. Currently model persistence entails extracting model parameters from this attribute and dumping them into a memory-optimized JSON file. However, once this JSON file is loaded into a model, ArviZ plotting and statistical functions are no longer supported. The pre/post-processing code to format this JSON also adds unnecessary complexity to theBaseModel
class and could make future maintenance more difficult. Plus let's be honest, this isn't a 350GB NLP model; reducing a <10 MBInferenceData
object down to a <4 MB JSON is not worth the hassle.Work Summary: Replace JSON formatting code in
_unload_params()
,fit()
,save_params()
andload_params()
with ArViz methods likearviz.InferenceData.to_json()
andarviz.from_json()
.https://arviz-devs.github.io/arviz/api/data.html
remove_hypers
can also be removed as a model class attribute, and I'm not opposed to renamingsave_params()
andload_params()
tosave_model()
andload_model()
either.Other Comments: JSON is the preferred format for model persistence. Pickle files have their place for the fast read/writes demanded of online learning and passing objects between CPU threads, but the added complexity of their implementation just isn't worth it for a model that is only saved & loaded one time. They are also a security risk since malware can be obscured in a pickle format. I could totally see a hacker with prior system access overwriting a .pkl model file with an executable that exfiltrates customer IDs whenever the model is ran.