arogozhnikov / hep_ml

Machine Learning for High Energy Physics.
https://arogozhnikov.github.io/hep_ml/
Other
176 stars 64 forks source link

Issue interfacing uboost classifiers with rep grid search: Exception an integer is required #49

Closed MaggaP closed 6 years ago

MaggaP commented 6 years ago

Hi,

I have been trying to add uBoost to my grid search in REP and have encountered some difficulties. I have made a minmal example of the error I get: ` df = pandas.DataFrame(np.random.randn(8, 4),columns=['A', 'B', 'C', 'D'])

df['E'] = 1

df['E'][3:] = 0

labels = df['E']

data = df.drop('E',axis=1)

uni_feats = 'C'

variables = ['A','B','D']

uboost_clf = uBoostClassifier(uniform_features=uni_feats, uniform_label=1,
train_features=variables)

grid_param = {}

grid_param['n_estimators'] = [50,100,125,150]

grid_param['n_neighbors'] = [50,51,52,53]

generator = RandomParameterOptimizer(grid_param,n_evaluations=2)
scorer = FoldingScorer(RocAuc(), folds=3, fold_checks=3)
estimator = SklearnClassifier(uboost_clf)
grid_finder = GridOptimalSearchCV(estimator, generator, scorer, parallel_profile='threads-4')
grid_finder.fit(data, labels)
`

This always results in the error:

Performing grid search in 4 threads ERROR:rep.metaml.gridsearch:Fail during training on the node Exception an integer is required Parameters n_estimators=150, n_neighbors=52 ERROR:rep.metaml.gridsearch:Fail during training on the node Exception an integer is required Parameters n_estimators=125, n_neighbors=52 2 evaluations done

I have had a look but i've had no luck finding the source of the exception and im a bit puzzled as to what is causing this, the same code works for a number of other classifiers.

Is this just a case of something which is not supported by uboost?

Any help or clarification here would be greatly appreciated, Ryan

arogozhnikov commented 6 years ago

Hi Ryan, there was a bug, I've just fixed it, you can upgrade with

pip uninstall hep_ml --yes
pip install https://github.com/arogozhnikov/hep_ml/archive/master.zip

Here is fixed working example:

from hep_ml.uboost import uBoostClassifier
from rep.metaml import FoldingScorer, RandomParameterOptimizer, GridOptimalSearchCV
from rep.report.metrics import RocAuc
from rep.estimators import SklearnClassifier

import pandas
import numpy as np

df = pandas.DataFrame(np.random.randn(300, 4),columns=['A', 'B', 'C', 'D'])
df['E'] = 1
df['E'][::2] = 0
labels = df['E']
data = df.drop('E',axis=1)
uni_feats = ['C']
variables = ['A','B','D']

uboost_clf = uBoostClassifier(uniform_features=uni_feats, uniform_label=1, train_features=variables)

grid_param = {}
grid_param['n_estimators'] = [50,100,125,150]
grid_param['n_neighbors'] = [50,51,52,53]

generator = RandomParameterOptimizer(grid_param, n_evaluations=2)
scorer = FoldingScorer(RocAuc(), folds=3, fold_checks=3)
estimator = SklearnClassifier(uboost_clf)
grid_finder = GridOptimalSearchCV(uboost_clf, generator, scorer, parallel_profile='threads-4')
grid_finder.fit(data, labels.values)
MaggaP commented 6 years ago

Hi Alex,

Thanks for taking a look at this so soon, I can confirm it's now working for me too.

Cheers, Ryan