ClimbsRocks / auto_ml

[UNMAINTAINED] Automated machine learning for analytics & production
http://auto-ml.readthedocs.io
MIT License
1.64k stars 310 forks source link

Issue with NLP data. #363

Closed shaneconner closed 6 years ago

shaneconner commented 6 years ago

I'm unable to finish training my model that uses only NLP data --it keeps freezing up while attempting to calculate the feature responses. Here's the resulting error:

AttributeError: 'DataFrame' object has no attribute 'Importance', 'Delta', 'FR_Decrementing', 'FR_Incrementing', 'FRD_abs', 'FRI_abs'

I believe this is because of the sparse matrix NLP data creates which primarily contains NaN values in each column and subsequently warps statistical calculations so the feature response dataframe is unable to get formed.

To offset, I set ml_for_analytics=False but still receiving the following error:

AttributeError: 'DataFrame' object has no attribute 'FR_Incrementing'

Any effort to resolve this error is appreciated. Great work on this by the way!

ClimbsRocks commented 6 years ago

oh sweet- thanks for letting me know! odd that that didn't get picked up by the test suite. i'll try to fix that up today or tomorrow, and certainly before next monday, unless the issue isn't what i expect it to be

black-snow commented 6 years ago

Sounds related to https://github.com/ClimbsRocks/auto_ml/issues/366

HalaKuwatly commented 6 years ago

is this fixed? I am getting the same error

ClimbsRocks commented 6 years ago

sorry 'bout the delay! testing a fix for this now. should be released tonight.

ClimbsRocks commented 6 years ago

fixed in last night's v2.9.9 release available on pypi (pip install --upgrade auto_ml)

HalaKuwatly commented 6 years ago

Thanks. Can you give a working code example of a classification task with NLP data ? I can't get it to work. Specifically here is I'm getting:

Here are the results from our GradientBoostingClassifier
predicting class
Calculating feature responses, for advanced analytics.
/anaconda/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_split.py:2026: FutureWarning: From version 0.21, test_size will always complement train_size unless both are specified.
  FutureWarning)