HDI-Project / ATM

Auto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).
https://hdi-project.github.io/ATM/
MIT License
527 stars 140 forks source link

Wrong keywords into ML models #8

Closed kkarrancsu closed 6 years ago

kkarrancsu commented 6 years ago

Hello, I am trying to test run your classifiers on our data, and am getting some errors when the system tries various classifiers. The relevant portions of the error messages are pasted below:

Chose` parameters for method dt:
    n_jobs = -1
    min_samples_leaf = 1
    n_estimators = 100
    criterion = entropy
    max_features = 0.950735797858
    max_depth = 6
TypeError: __init__() got an unexpected keyword argument 'n_jobs'

Chose parameters for method dt:
    C = 0.00359684119303
    tol = 0.000357435603328
    fit_intercept = True
    penalty = l2
    _scale = True
    dual = False
    class_weight = auto
TypeError: __init__() got an unexpected keyword argument 'C'

Chose parameters for method logreg:
    n_jobs = -1
    min_samples_leaf = 1
    n_estimators = 100
    criterion = gini
    max_features = 0.218919710352
    max_depth = 7
TypeError: __init__() got an unexpected keyword argument 'min_samples_leaf'

Based on these errors, it seems that the hyperparameters input for the scikit-learn's DecisionTree model is being mixed up with the hyperparameters input for scikit-learn's LogisticRegression model. For example, LogisticRegression does not have a "min_samples_leaf" hyperparameter. Similarly, DecisionTreeClassifier does not have C or n_jobs as hyperparameters. Digging around, the methods/decision_tree.json and methods/logistic_regression.json files seem correct .. so I'm not sure why this is getting mixed up.

I get similar issues when running against the example provided in the readme. Here is a copy/paste of the entire error message

Selector: <class 'btb.selection.uniform.Uniform'>
Tuner: <class 'btb.tuning.uniform.Uniform'>
Choosing hyperparameters...
Chose parameters for method knn:
    C = 0.000128015603097
    tol = 0.000148636727508
    fit_intercept = True
    penalty = l2
    _scale = True
    dual = True
    class_weight = auto
Creating classifier...
Testing classifier...
Error testing classifier: datarun=<ID = 5, dataset ID = 5, strategy = uniform__uniform, budget = classifier (100), status: running>
Traceback (most recent call last):
  File "atm/worker.py", line 440, in run_classifier
    model, performance = self.test_classifier(classifier_id, params)
  File "atm/worker.py", line 374, in test_classifier
    performance = wrapper.start()
  File "/home/kkarra/atm/atm/wrapper.py", line 97, in start
    self.make_pipeline()
  File "/home/kkarra/atm/atm/wrapper.py", line 383, in make_pipeline
    classifier = self.class_(**classifier_params)
  File "/home/kkarra/atm/venv/local/lib/python2.7/site-packages/sklearn/neighbors/classification.py", line 126, in __init__
    metric_params=metric_params, n_jobs=n_jobs, **kwargs)
TypeError: _init_params() got an unexpected keyword argument 'C'

Here, it seems that the KNN model is getting the wrong keywords. I'm not sure why model's are not being optimized with appropriate keywords. I'm wondering if I should dig further to ensure that the selected model chooses the correct keywords, or if this an identified bug already in the course of porting from old environment to new?

thswear commented 6 years ago

The mismatch between the methods and the hyperparameters has been fixed. Please let us know if you still see any issues. Thanks!