Closed xloffree closed 1 year ago
Hi @xloffree,
First of all, you may save a fitted model in PiML using the following approach.
import dill
clf = exp.get_model("GAM").estimator
with open('name_model.pkl', 'wb') as file:
dill.dump(clf, file)
with open('name_model.pkl', 'rb') as file:
clf_load = dill.load(file)
train_x = exp.get_model("GAM").get_data(train=True)[0]
clf_load.predict(train_x)
You may also register the loaded model into PiML using the demo at https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_ExternalModels.ipynb#scrollTo=7WGJ8PzutkLh, "Scenario 2: Register external fitted models with dataset".
Second, all the interactive panels used in PiML are based on python runtime. Currently, we don't have such functionality to export interactive results; the best way is to save the notebook as static Html: a) Click on "Widgets -> Save Notebook Widget State"; b) Export it as Html by "File -> Download as -> HTML (.html)".
Hi, thank you for your help with this. Has this solution worked for you with PiML? When I try this solution, this line:
with open('name_model.pkl', 'rb') as file: clf_load = dill.load(file)
results in a recursion error every time. I tried changing the recursion limit but even with a recursion limit pf 10000000 I still run into this error. Increasing the recursion limit indefinitely just causes the kernel to crash.
The error is as follows:
RecursionError Traceback (most recent call last) Cell In [39], line 2 1 with open('name_model.pkl', 'rb') as file: ----> 2 clf_load = dill.load(file)
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:272, in load(file, ignore, kwds)
266 def load(file, ignore=None, kwds):
267 """
268 Unpickle an object from a file.
269
270 See :func:loads
for keyword arguments.
271 """
--> 272 return Unpickler(file, ignore=ignore, **kwds).load()
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:419, in Unpickler.load(self) 418 def load(self): #NOTE: if settings change, need to update attributes --> 419 obj = StockUnpickler.load(self) 420 if type(obj).module == getattr(_main_module, 'name', 'main'): 421 if not self._ignore: 422 # point obj class to main
File piml/models/glm.py:32, in piml.models.glm.GLMRegressor.getattr()
File piml/models/glm.py:32, in piml.models.glm.GLMRegressor.getattr()
[... skipping similar frames: piml.models.glm.GLMRegressor.__getattr__ at line 32 (9999967 times)]
File piml/models/glm.py:32, in piml.models.glm.GLMRegressor.getattr()
RecursionError: maximum recursion depth exceeded while calling a Python object
Any help with this would be very appreciated. When using PiML for research purposes, being able to save a trained model is essential for reproducibility. Thank you!
Hi @xloffree,
For GLM, you may use the following code to do model saving,
import dill
clf = exp.get_model("GLM").estimator.__model__
with open('name_model.pkl', 'wb') as file:
dill.dump(clf, file)
with open('name_model.pkl', 'rb') as file:
clf_load = dill.load(file)
train_x = exp.get_model("GLM").get_data(train=True)[0]
clf_load.predict(train_x)
Thank you very much. This works. How can I use this trained model to predict on other datasets? Is this functionality exclusively part of PiML or is there third party documentation I can view for more background on how this code works?
Thank you
Thank you very much. This works. How can I use this trained model to predict on other datasets? Is this functionality exclusively part of PiML or is there third party documentation I can view for more background on how this code works?
Thank you
If you have another dataset with the same input features set, then you can use this model to get predictions. Assume the new data has covariates "X" (the raw scale without preprocessing), then you can get the prediction using the fitted model in PiML via the following way.
clf = exp.get_model("GLM").estimator
xx = exp.get_data(x=X)
clf.predict(xx)
What datatype should X be here? Is it a dataframe that includes all of the data for all of the predictors?
Thanks
Hi,
Is different code required to save each different type of built-in model in PiML? It seems whenever I try to save a different model, I run into a new error. Is there somewhere where I can see the code for how to save each different type of model? Thank you
What datatype should X be here? Is it a dataframe that includes all of the data for all of the predictors?
The X is the numpy array of the selected features. It should have the same data format as the uploaded raw data, without any preprocessing.
Hi,
Is different code required to save each different type of built-in model in PiML? It seems whenever I try to save a different model, I run into a new error. Is there somewhere where I can see the code for how to save each different type of model? Thank you
For the GLMRegressor model, use
import dill
clf = exp.get_model("GLM").estimator.__model__
with open('name_model.pkl', 'wb') as file:
dill.dump(clf, file)
with open('name_model.pkl', 'rb') as file:
clf_load = dill.load(file)
train_x = exp.get_model("GLM").get_data(train=True)[0]
clf_load.predict(train_x)
For all the rest models, you can use
import dill
clf = exp.get_model("GAM").estimator
with open('name_model.pkl', 'wb') as file:
dill.dump(clf, file)
with open('name_model.pkl', 'rb') as file:
clf_load = dill.load(file)
train_x = exp.get_model("GAM").get_data(train=True)[0]
clf_load.predict(train_x)
BTW, we will provide a unified API for model saving in the next release.
clf = exp.get_model("GLM").estimator xx = exp.get_data(x=X) clf.predict(xx)
I still do not understand what this means. I have tried to pass df and df.columns as X but it does not work. Do you have an example of what X should be?
Thank you
Would it be possible for us to discuss PiML over a zoom meeting? That might be more efficient than messages on this page.
Hi, here X is just an n*p numpy array, where n is the sample size and p is the number of predictors (excluding unselected features and the response feature).
For instance, assume the raw data is a pd.DataFrame as follows,
season | yr | mnth | hr | holiday | weekday | workingday | weathersit | temp | atemp | hum | windspeed | cnt -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 6.0 | 0.0 | 1.0 | 0.24 | 0.2879 | 0.81 | 0.0000 | 16.0 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 6.0 | 0.0 | 1.0 | 0.22 | 0.2727 | 0.80 | 0.0000 | 40.0 1.0 | 0.0 | 1.0 | 2.0 | 0.0 | 6.0 | 0.0 | 1.0 | 0.22 | 0.2727 | 0.80 | 0.0000 | 32.0 1.0 | 0.0 | 1.0 | 3.0 | 0.0 | 6.0 | 0.0 | 1.0 | 0.24 | 0.2879 | 0.75 | 0.0000 | 13.0 1.0 | 0.0 | 1.0 | 4.0 | 0.0 | 6.0 | 0.0 | 1.0 | 0.24 | 0.2879 | 0.75 | 0.0000 | 1.0 ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... 1.0 | 1.0 | 12.0 | 19.0 | 0.0 | 1.0 | 1.0 | 2.0 | 0.26 | 0.2576 | 0.60 | 0.1642 | 119.0 1.0 | 1.0 | 12.0 | 20.0 | 0.0 | 1.0 | 1.0 | 2.0 | 0.26 | 0.2576 | 0.60 | 0.1642 | 89.0 1.0 | 1.0 | 12.0 | 21.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.26 | 0.2576 | 0.60 | 0.1642 | 90.0 1.0 | 1.0 | 12.0 | 22.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.26 | 0.2727 | 0.56 | 0.1343 | 61.0 1.0 | 1.0 | 12.0 | 23.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.26 | 0.2727 | 0.65 | 0.1343 | 49.0Then you selected season, yr, mnth, hr as the covariates in exp.data_summary and exp.feature_select, and cnt as the response (in exp.data_prepare).
The X is supposed to be a np.array that looks like:
season | yr | mnth | hr -- | -- | -- | -- | 1.0 | 0.0 | 1.0 | 0.0 1.0 | 0.0 | 1.0 | 1.0 1.0 | 0.0 | 1.0 | 2.0 1.0 | 0.0 | 1.0 | 3.0 1.0 | 0.0 | 1.0 | 4.0 ... | ... | ... | ... | 1.0 | 1.0 | 12.0 | 19.0 1.0 | 1.0 | 12.0 | 20.0 1.0 | 1.0 | 12.0 | 21.0 1.0 | 1.0 | 12.0 | 22.0 1.0 | 1.0 | 12.0 | 23.0An example of X would be, which is the selected covariates of the loaded data.
X = exp.dataset.x
clf = exp.get_model("GLM").estimator
xx = exp.get_data(x=X)
clf.predict(xx)
Hope that helps.
Saving a model works for glm and gaml. After that, none of the other models will save and they result in an error:
PicklingError Traceback (most recent call last) Cell In [18], line 4 1 clf = exp.get_model("GAMI-Net").estimator 3 with open('LVS_GAMI-Net.pkl', 'wb') as file: ----> 4 dill.dump(clf, file)
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:235, in dump(obj, file, protocol, byref, fmode, recurse, kwds) 233 _kwds = kwds.copy() 234 _kwds.update(dict(byref=byref, fmode=fmode, recurse=recurse)) --> 235 Pickler(file, protocol, _kwds).dump(obj) 236 return
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:394, in Pickler.dump(self, obj) 392 def dump(self, obj): #NOTE: if settings change, need to update attributes 393 logger.trace_setup(self) --> 394 StockPickler.dump(self, obj)
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:487, in _Pickler.dump(self, obj) 485 if self.proto >= 4: 486 self.framer.start_framing() --> 487 self.save(obj) 488 self.write(STOP) 489 self.framer.end_framing()
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:388, in Pickler.save(self, obj, save_persistent_id) 386 msg = "Can't pickle %s: attribute lookup builtins.generator failed" % GeneratorType 387 raise PicklingError(msg) --> 388 StockPickler.save(self, obj, save_persistent_id)
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:603, in _Pickler.save(self, obj, save_persistent_id) 599 raise PicklingError("Tuple returned by %s must have " 600 "two to six elements" % reduce) 602 # Save the reduce() output and finally memoize the object --> 603 self.save_reduce(obj=obj, *rv)
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:717, in _Pickler.save_reduce(self, func, args, state, listitems, dictitems, state_setter, obj) 715 if state is not None: 716 if state_setter is None: --> 717 save(state) 718 write(BUILD) 719 else: 720 # If a state_setter is specified, call it instead of load_build 721 # to update obj's with its previous state. 722 # First, push state_setter and its tuple of expected arguments 723 # (obj, state) onto the stack.
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:388, in Pickler.save(self, obj, save_persistent_id) 386 msg = "Can't pickle %s: attribute lookup builtins.generator failed" % GeneratorType 387 raise PicklingError(msg) --> 388 StockPickler.save(self, obj, save_persistent_id)
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:560, in _Pickler.save(self, obj, save_persistent_id) 558 f = self.dispatch.get(t) 559 if f is not None: --> 560 f(self, obj) # Call unbound method with explicit self 561 return 563 # Check private dispatch table if any, or else 564 # copyreg.dispatch_table
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:1186, in save_module_dict(pickler, obj) 1183 if is_dill(pickler, child=False) and pickler._session: 1184 # we only care about session the first pass thru 1185 pickler._first_pass = False -> 1186 StockPickler.save_dict(pickler, obj) 1187 logger.trace(pickler, "# D2") 1188 return
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:971, in _Pickler.save_dict(self, obj) 968 self.write(MARK + DICT) 970 self.memoize(obj) --> 971 self._batch_setitems(obj.items())
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:997, in _Pickler._batch_setitems(self, items) 995 for k, v in tmp: 996 save(k) --> 997 save(v) 998 write(SETITEMS) 999 elif n:
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:388, in Pickler.save(self, obj, save_persistent_id) 386 msg = "Can't pickle %s: attribute lookup builtins.generator failed" % GeneratorType 387 raise PicklingError(msg) --> 388 StockPickler.save(self, obj, save_persistent_id)
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:589, in _Pickler.save(self, obj, save_persistent_id) 587 # Check for string returned by reduce(), meaning "save as global" 588 if isinstance(rv, str): --> 589 self.save_global(obj, rv) 590 return 592 # Assert that reduce() returned a tuple
File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:1070, in _Pickler.save_global(self, obj, name) 1068 obj2, parent = _getattribute(module, name) 1069 except (ImportError, KeyError, AttributeError): -> 1070 raise PicklingError( 1071 "Can't pickle %r: it's not found as %s.%s" % 1072 (obj, module_name, name)) from None 1073 else: 1074 if obj2 is not obj:
PicklingError: Can't pickle <cyfunction Model.register_model.
Thanks for reporting this issue.
You may use the following scripts to save and load a fitted model except for GLMRegressor.
import dill
clf = exp.get_model("GAM").estimator
clf.__sklearn_is_fitted__ = lambda : True
with open('name_model.pkl', 'wb') as file:
dill.dump(clf, file)
with open('name_model.pkl', 'rb') as file:
clf_load = dill.load(file)
train_x = exp.get_model("GAM").get_data(train=True)[0]
clf_load.predict(train_x)
Thanks. I am able to save every model as a .pkl file now. How can I easily load the model and view its interpretability metrics within PiML? For example, if I have an EBM model saved as a .pkl, and I want to view the results of exp.model_interpret(), how can I do this without retraining?
Thank you
@xloffree,
You can do the following to register it into the PiML workflow:
pipeline = exp.make_pipeline(model=clf_load)
exp.register(pipeline, "loaded_model")
exp.model_interpret()
Note that in this case, you need to do data loading, summary, and preparation first, so that all the data are available. An alternative way is to specify the required data information in exp.register. You can find the details in the docs of exp.register function, and the example usage in https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_ExternalModels.ipynb.
Hi,
Is there a way to save models generated in PIML so that I do not need to run the program and train the model each time? Also, what is the best way to export results? Is there any way to export results such that the widgets are still interactive? I have just been saving the notebook as an html file in order to share results.
Thank you