donlnz / nonconformist

Python implementation of the conformal prediction framework.
MIT License
425 stars 94 forks source link

saving conformal predictor #8

Open wgmueller1 opened 7 years ago

wgmueller1 commented 7 years ago

Thank you for your library!

Is there any easy way to save the fitted and calibrated conformal predictors for re-use? I'd like to make conformal predictions in an online setting.

I tried just pickling the ipc object, but that failed.

donlnz commented 7 years ago

Pickling IcpClassifier or IcpRegressor will fail, due to them containing lambda expressions, which are not picklable by default. The easiest way to fix this is to simply import the dill package, which automatically makes lambda expressions picklable.

import dill, joblib

# ...

joblib.dump(icp, 'my_filename') # store model
icp = joblib.load('my_filename') # load stored model

Full running example: https://gist.github.com/donlnz/c00791aba32330facf315396f9935c9a

NB: Due to some early decisions in developing nonconformist, the underlying model caches its predictions (initially, it was only possible to output predictions for a single specific significance level at a time, leading to the underlying model being applied multiple times for the same data if the same test set was to be evaluated at several significance levels; of course, commonly in test settings, the same conformal predictor would be applied to the same test set for each significance level 0.01, 0.02, ... 0.98, 0.99.). Long story short: after calling IcpClassifier.calibrate or IcpClassifier.predict (same goes for IcpRegressor), the last seen calibration set (or test set) will be stored in BaseModelAdapter. This might lead to files that are very large if the model is saved to disk (and that might additionally contain sensitive data). This behaviour will most likely be removed in the future, or at least be made optional. In the meantime, it is suggested that the cache is cleared before storing models to disk.

This is done as such:

icp.nc_function.model.last_x = None
icp.nc_function.model.last_y = None

joblib.dump(icp, 'my_filename')
wgmueller1 commented 7 years ago

Thank you for the response. What is your environment?

When I run your gist using Python 3.5.2 :: Anaconda 4.3.1 (x86_64) and the following library versions

dill==0.2.5 joblib==0.11 scikit-learn==0.18.1

I get the following error:

PicklingError: Can't pickle <function BaseIcp.init.. at 0x1174c70d0>: it's not found as nonconformist.icp.BaseIcp.init..

donlnz commented 7 years ago

I'm able to run my code example on two separate setups:

Setup 1 WinPython x64 2.7.6 (Windows) dill==0.2.7 joblib==0.11 scikit-learn==0.15.2

Setup 2 Python 3.5.2 x64 (Linux) dill==0.2.7 joblib==0.11 scikit-learn==0.18.1

Does running python -v nonconformist_save_load.py yield any further insights?