ClimbsRocks / auto_ml

[UNMAINTAINED] Automated machine learning for analytics & production
MIT License
1.64k stars 310 forks source link

'FinalModelATC' object has no attribute 'feature_ranges' #126

Closed akodate closed 8 years ago

akodate commented 8 years ago

I'm trying to run your "Getting Started" example on the numerai training data and getting the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-39-aab5c9ba7e0f> in <module>()
      6 # Can pass in type_of_estimator='regressor' as well
----> 8 ml_predictor.train(df_dict)
      9 # Wait for the machine to learn all the complex and beautiful patterns in your data...

/Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in train(***failed resolving arguments***)
--> 555         self.perform_grid_search_by_model_names(estimator_names, scoring, X_df, y)
    557         # If we ran GridSearchCV, we will have to pick the best model

/Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in perform_grid_search_by_model_names(self, estimator_names, scoring, X_df, y)
    672             if self.ml_for_analytics and model_name in ('LogisticRegression', 'RidgeClassifier', 'LinearRegression', 'Ridge'):
--> 673                 self._print_ml_analytics_results_regression()
    674             elif self.ml_for_analytics and model_name in ['RandomForestClassifier', 'RandomForestRegressor', 'XGBClassifier', 'XGBRegressor', 'GradientBoostingRegressor', 'GradientBoostingClassifier']:
    675                 self._print_ml_analytics_results_random_forest()

/Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in _print_ml_analytics_results_regression(self)
    770             trained_coefficients = self.trained_pipeline.named_steps['final_model'].model.coef_
--> 772         feature_ranges = self.trained_pipeline.named_steps['final_model'].feature_ranges
    774         # TODO(PRESTON): readability. Can probably do this in a single zip statement.

AttributeError: 'FinalModelATC' object has no attribute 'feature_ranges'

Are you familiar with this type of issue?

ClimbsRocks commented 8 years ago

thanks for filing! i do know that issue, and thought i'd squashed it already. give me a minute to check on that, and fix it if it's still bugging.

ClimbsRocks commented 8 years ago

yep, found the bug. i wasn't thorough enough when squashing it last time. i'll have a fix out shortly.

akodate commented 8 years ago

Thanks, I'm looking forward to being able to try out auto_ml.

ClimbsRocks commented 8 years ago

just pushed the fix! i'll publish to pypi later tonight probably with a slew of other updates i'm making. but you can pull down from github for now.

ClimbsRocks commented 8 years ago

thanks again for filing this bug!

i'm honestly surprised the test_script runs at all. i haven't touched that in months, and the project's gone through some pretty significant performance tuning and new feature additions since the initial launch.

if you're up for it, i'd love a copy of your code to use as a quick example to show people how it's used! it shouldn't take nearly as many lines of code as makes it seem.

akodate commented 8 years ago

For my initial attempt, I was just using the example you provided in the documentation (rather than the code in

from auto_ml import Predictor

col_desc_dictionary = {'target': 'output'}

ml_predictor = Predictor(type_of_estimator='classifier', column_descriptions=col_desc_dictionary)

Thanks to the build you just pushed, the code works for me now (although I'm encountering trouble with other features like predict_proba—you may see me raise another issue.)

If you'd like to see my equivalent for, I'd be happy to provide it to you. It should be quite short, as you predicted.

ClimbsRocks commented 8 years ago

ooh- i'd love to hear the issue you're running into with predict_proba! and to see your example script. thanks for the feedback so far.

akodate commented 8 years ago

My example script looks a bit like this:

import pandas as pd
from sklearn.cross_validation import train_test_split
from auto_ml import Predictor

df = pd.read_csv('your/path/to/numerai_training_data.csv')

training_data, testing_data = train_test_split(df, test_size=.2)

ml_predictor = Predictor(type_of_estimator='classifier', column_descriptions={'target': 'output'})

X_test = testing_data.drop('target', axis=1)
y_test = testing_data['target']
print(ml_predictor.score(X_test, y_test))

I didn't even know auto_ml could take dataframes until I saw—that seems to be a (very useful) undocumented feature.

Unfortunately I only get a score of about -0.25 on the current numerai training data (even when I change the test set size), but if you have any suggestions I'd be very interested.

ClimbsRocks commented 8 years ago

@akodate: ah, crap- i hadn't realized that most people are probably interpreting the score for classifiers as accuracy, rather than the brier-score-loss.

i've been wanting to improve our scoring logging for a little while now, but this gives me all the impetus i need. i'll try to get that clarification pushed this weekend!

the dataset is a bunch of stocks, so every percentage point of accuracy above 50% is a huge bump. but that said, you should be seeing some improvement above 50%. again though, the default scoring metric reported on is brier-score-loss, not accuracy.