ClimbsRocks / auto_ml

[UNMAINTAINED] Automated machine learning for analytics & production
http://auto-ml.readthedocs.io
MIT License
1.64k stars 310 forks source link

'FinalModelATC' object has no attribute 'feature_ranges' #126

Closed akodate closed 8 years ago

akodate commented 8 years ago

I'm trying to run your "Getting Started" example on the numerai training data and getting the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-39-aab5c9ba7e0f> in <module>()
      6 # Can pass in type_of_estimator='regressor' as well
      7 
----> 8 ml_predictor.train(df_dict)
      9 # Wait for the machine to learn all the complex and beautiful patterns in your data...
     10 

/Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in train(***failed resolving arguments***)
    553 
    554 
--> 555         self.perform_grid_search_by_model_names(estimator_names, scoring, X_df, y)
    556 
    557         # If we ran GridSearchCV, we will have to pick the best model

/Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in perform_grid_search_by_model_names(self, estimator_names, scoring, X_df, y)
    671 
    672             if self.ml_for_analytics and model_name in ('LogisticRegression', 'RidgeClassifier', 'LinearRegression', 'Ridge'):
--> 673                 self._print_ml_analytics_results_regression()
    674             elif self.ml_for_analytics and model_name in ['RandomForestClassifier', 'RandomForestRegressor', 'XGBClassifier', 'XGBRegressor', 'GradientBoostingRegressor', 'GradientBoostingClassifier']:
    675                 self._print_ml_analytics_results_random_forest()

/Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in _print_ml_analytics_results_regression(self)
    770             trained_coefficients = self.trained_pipeline.named_steps['final_model'].model.coef_
    771 
--> 772         feature_ranges = self.trained_pipeline.named_steps['final_model'].feature_ranges
    773 
    774         # TODO(PRESTON): readability. Can probably do this in a single zip statement.

AttributeError: 'FinalModelATC' object has no attribute 'feature_ranges'

Are you familiar with this type of issue?

ClimbsRocks commented 8 years ago

thanks for filing! i do know that issue, and thought i'd squashed it already. give me a minute to check on that, and fix it if it's still bugging.

ClimbsRocks commented 8 years ago

yep, found the bug. i wasn't thorough enough when squashing it last time. i'll have a fix out shortly.

akodate commented 8 years ago

Thanks, I'm looking forward to being able to try out auto_ml.

ClimbsRocks commented 8 years ago

just pushed the fix! i'll publish to pypi later tonight probably with a slew of other updates i'm making. but you can pull down from github for now.

ClimbsRocks commented 8 years ago

thanks again for filing this bug!

i'm honestly surprised the test_script runs at all. i haven't touched that in months, and the project's gone through some pretty significant performance tuning and new feature additions since the initial launch.

if you're up for it, i'd love a copy of your code to use as a quick example to show people how it's used! it shouldn't take nearly as many lines of code as test_script.py makes it seem.

akodate commented 8 years ago

For my initial attempt, I was just using the example you provided in the documentation (rather than the code in test_script.py):

from auto_ml import Predictor

col_desc_dictionary = {'target': 'output'}

ml_predictor = Predictor(type_of_estimator='classifier', column_descriptions=col_desc_dictionary)
ml_predictor.train(df_dict)
ml_predictor.predict(tournament_dict)

Thanks to the build you just pushed, the code works for me now (although I'm encountering trouble with other features like predict_proba—you may see me raise another issue.)

If you'd like to see my equivalent for test_script.py, I'd be happy to provide it to you. It should be quite short, as you predicted.

ClimbsRocks commented 8 years ago

ooh- i'd love to hear the issue you're running into with predict_proba! and to see your example script. thanks for the feedback so far.

akodate commented 8 years ago

My example script looks a bit like this:

import pandas as pd
from sklearn.cross_validation import train_test_split
from auto_ml import Predictor

df = pd.read_csv('your/path/to/numerai_training_data.csv')

training_data, testing_data = train_test_split(df, test_size=.2)

ml_predictor = Predictor(type_of_estimator='classifier', column_descriptions={'target': 'output'})
ml_predictor.train(training_data)

X_test = testing_data.drop('target', axis=1)
y_test = testing_data['target']
print(ml_predictor.score(X_test, y_test))

I didn't even know auto_ml could take dataframes until I saw test_script.py—that seems to be a (very useful) undocumented feature.

Unfortunately I only get a score of about -0.25 on the current numerai training data (even when I change the test set size), but if you have any suggestions I'd be very interested.

ClimbsRocks commented 8 years ago

@akodate: ah, crap- i hadn't realized that most people are probably interpreting the score for classifiers as accuracy, rather than the brier-score-loss.

i've been wanting to improve our scoring logging for a little while now, but this gives me all the impetus i need. i'll try to get that clarification pushed this weekend!

the numer.ai dataset is a bunch of stocks, so every percentage point of accuracy above 50% is a huge bump. but that said, you should be seeing some improvement above 50%. again though, the default scoring metric reported on is brier-score-loss, not accuracy.