arogozhnikov / hep_ml

Machine Learning for High Energy Physics.
176 stars 64 forks source link

Saving uboost BDT with tf/keras base estimators #63

Open srishtibhasin opened 3 years ago

srishtibhasin commented 3 years ago


I am trying to use a uBoost BDT to achieve uniform signal efficiency. My base estimator is a Keras model (Tensorflow 2.2), which I have written as a scikit-learn BaseEstimator subclass using tensorflow.keras.wrappers.scikit_learn.KerasClassifier. The training and everything seems to work fine, but I am encountering an error when I try to save the uboost classifier with pickle/joblib. The error is TypeError: can't pickle _thread.RLock objects (full error at bottom - it is mostly a long thread of calls to pickle )

From trying to look it up it seems the error is usually to do with the way tensorflow is run, but I'm only creating a simple model and fitting and all the session handling should be taken care of in this version of tf/keras. Maybe this answer is related ie. perhaps there is a call to something from the model that leaves an unserializable tensor object? As I am using the BDT not the classifier, I assume it is not to do with any parallel processes either?

Please let me know if you know what is causing the issue or if there is some way I can work around it.



arogozhnikov commented 3 years ago

Poor picklability of keras is a long-known issue (you can google keras with the same mistake error).

You may be fortunate to have some of variables being passed e.g. in lambda not through calls,

But otherwise I'm not sure there will be a simple solution

srishtibhasin commented 3 years ago

Yes, but of course keras has its own save_model functionality. So do you think it is simply not possible to save a uBoost model which is based on a keras model?

arogozhnikov commented 3 years ago

it is simply not possible to save a uBoost model which is based on a keras model?

It all goes down do pickle-ability of items. hep_ml's uBoost is completely pickle-able, but keras model is not.

Option 1. Ask keras maintainers why your model is not pickle-able Option 2. Store keras models separately from uBoost.

estimators = clf.estimators_ # list of keras models
# TODO save estimators somehow using keras tools
# delete estimators 
clf.estimators_ = None
with open('uboost.pkl', 'wb') as f:
   joblib.dump(clf, f)

# loading
with open('uboost.pkl', 'rb') as f:
   clf = joblib.load(f)

estimators = .... # load  estimators
clf.estimators_ = estimators
arogozhnikov commented 3 years ago

there is option 3 as well - find truly sklearn-compatible NN package =)

srishtibhasin commented 3 years ago

Ok I will try to implement option 2, thanks!