AutoViML / Auto_ViML

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Apache License 2.0
518 stars 101 forks source link

getting ValueError when running notebook with XGBoost on Titanic dataset. #3

Closed dsbyprateekg closed 4 years ago

dsbyprateekg commented 4 years ago

Hi,

Thanks for sharing your work! I just tested the titanic dataset downloaded from https://www.kaggle.com/c/titanic/data with XGBoost as below- m, feats, trainm, testm = Auto_ViML(train, target, test, sample_submission, scoring_parameter=scoring_parameter, hyper_param='GS',feature_reduction=True, Boosting_Flag=True,Binning_Flag=False, Add_Poly=0, Stacking_Flag=False, Imbalanced_Flag=False, verbose=1)

Once I ran the above code then found below error- ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields Name

It seems same error occurs in case of Boosting_Flag=None. Logs of the console just prior to error is as below-

Train (Size: 891,12) has Single_Label with target: ['Survived'] " ################### Binary-Class ##################### " Shuffling the data set before training Class -> Counts -> Percent 1: 342 -> 38.4% 0: 549 -> 61.6% Selecting 2-Class Classifier... Using GridSearchCV for Hyper Parameter tuning... Target Survived is already numeric. No transformation done. Top columns in Train with missing values: ['Cabin', 'Age', 'Embarked'] and their missing value totals: [687, 177, 2] Classifying variables in data set... Number of Numeric Columns = 2 Number of Integer-Categorical Columns = 3 Number of String-Categorical Columns = 1 Number of Factor-Categorical Columns = 0 Number of String-Boolean Columns = 1 Number of Numeric-Boolean Columns = 0 Number of Discrete String Columns = 2 Number of NLP String Columns = 0 Number of Date Time Columns = 0 Number of ID Columns = 2 Number of Columns to Delete = 0 11 Predictors classified... This does not include the Target column(s) 2 variables removed since they were some ID or low-information variables Completed Label Encoding, Missing Value Imputing and Scaling of data without errors. No Missing values in Train Test data has no missing values Number of numeric variables = 5 No variables were removed since no highly correlated variables found in data

Data Ready for Modeling with Target variable = Survived Starting Selection among 11 predictors... Number of numeric variables = 5 No variables were removed since no highly correlated variables found in data Adding 6 categorical variables to reduced numeric variables of 5 Selected No. of variables = 11 Finding Important Features... in 11 variables

AutoViML commented 4 years ago

Thanks Prateek. Sorry there was a typo that had caused the error. It has now been fixed. Please do:

pip install autoviml --upgrade

that should fix the bug. Please try it and let me know. Ram Ram

On Mon, Dec 2, 2019 at 7:06 AM Prateek Gupta notifications@github.com wrote:

Hi,

Thanks for sharing your work! I just tested the titanic dataset downloaded from https://www.kaggle.com/c/titanic/data with XGBoost as below- m, feats, trainm, testm = Auto_ViML(train, target, test, sample_submission, scoring_parameter=scoring_parameter, hyper_param='GS',feature_reduction=True, Boosting_Flag=True,Binning_Flag=False, Add_Poly=0, Stacking_Flag=False, Imbalanced_Flag=False, verbose=1)

Once I ran the above code then found below error- ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields Name

Logs of the console just prior to error is as below-

Train (Size: 891,12) has Single_Label with target: ['Survived'] " ################### Binary-Class ##################### " Shuffling the data set before training Class -> Counts -> Percent 1: 342 -> 38.4% 0: 549 -> 61.6% Selecting 2-Class Classifier... Using GridSearchCV for Hyper Parameter tuning... Target Survived is already numeric. No transformation done. Top columns in Train with missing values: ['Cabin', 'Age', 'Embarked'] and their missing value totals: [687, 177, 2] Classifying variables in data set... Number of Numeric Columns = 2 Number of Integer-Categorical Columns = 3 Number of String-Categorical Columns = 1 Number of Factor-Categorical Columns = 0 Number of String-Boolean Columns = 1 Number of Numeric-Boolean Columns = 0 Number of Discrete String Columns = 2 Number of NLP String Columns = 0 Number of Date Time Columns = 0 Number of ID Columns = 2 Number of Columns to Delete = 0 11 Predictors classified... This does not include the Target column(s) 2 variables removed since they were some ID or low-information variables Completed Label Encoding, Missing Value Imputing and Scaling of data without errors. No Missing values in Train Test data has no missing values Number of numeric variables = 5 No variables were removed since no highly correlated variables found in data

Data Ready for Modeling with Target variable = Survived Starting Selection among 11 predictors... Number of numeric variables = 5 No variables were removed since no highly correlated variables found in data Adding 6 categorical variables to reduced numeric variables of 5 Selected No. of variables = 11 Finding Important Features... in 11 variables

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AutoViML/Auto_ViML/issues/3?email_source=notifications&email_token=AMKBH6DXK574S2WTPQ4K7V3QWT23VA5CNFSM4JTUZBZKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H5HW7CQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMKBH6B3MXLH3KN4ELXLIQLQWT23VANCNFSM4JTUZBZA .

dsbyprateekg commented 4 years ago

Hi Ram,

I have updated the library and restarted the notebook but getting same error.

deneshkumar commented 4 years ago

@AutoViML First of all, I would like to congratulate you for your great work. This AutoML library has a huge potential to disrupt the AutoML domain.

Getting back to the issue, I faced the same issue and I believe it is due to the fact that XGBoost Classifier does not auto-encode the categorical features like LightGBM. This need to corrected either by applying encoding of categorical values in "preds".

AutoViML commented 4 years ago

Can you please send me an email to the address that is on my GutHub page? I’d like to probe if this new update works since I tested it on the same dataset that I downloaded from Kaggle. Thanks Ram

On Tue, Dec 3, 2019 at 7:49 AM deneshkumar notifications@github.com wrote:

@AutoViML https://github.com/AutoViML First of all, I would like to congratulate you for your great work. This AutoML library has a huge potential to disrupt the AutoML domain.

Getting back to the issue, I faced the same issue and I believe it is due to the fact that XGBoost Classifier does not auto-encode the categorical features like LightGBM. This need to corrected either by applying encoding of categorical values in "preds".

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/AutoViML/Auto_ViML/issues/3?email_source=notifications&email_token=AMKBH6B6HZQ3R3N7BGFTLS3QWZIURA5CNFSM4JTUZBZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFZIETA#issuecomment-561152588, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMKBH6H7MRAGOZSSWI6BIB3QWZIURANCNFSM4JTUZBZA .

deneshkumar commented 4 years ago

@dsbyprateekg Try once with "Boosting_Flag" as False

Boosting_Flag as True with enable "CatBoost" model which does not include label encoding as of now.

AutoViML commented 4 years ago

Actually this should now be fixed Try this code on your Command shell for upgrading autoviml...

pip3 install --upgrade --ignore-installed --no-deps autoviml

AutoViML

dsbyprateekg commented 4 years ago

Hi Ram,

I have again updated the library, verified it's version and it is 1.0.45. But getting same error in notebook. Issue is not solved yet. version after_update

dsbyprateekg commented 4 years ago

No Ram, Issue is not solved yet. After updating again, I am getting same error.

On Fri, Dec 6, 2019 at 6:49 AM AutoViz and Auto_ViML < notifications@github.com> wrote:

Actually this should now be fixed Try this code on your Command shell for upgrading autoviml...

pip3 install --upgrade --ignore-installed --no-deps autoviml

AutoViML

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AutoViML/Auto_ViML/issues/3?email_source=notifications&email_token=AHLG7TJ5MU4RB3TE2A4WTX3QXGSA7A5CNFSM4JTUZBZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGCWJ5Y#issuecomment-562390263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHLG7TK47MMFV2MX3NYKITDQXGSA7ANCNFSM4JTUZBZA .

AutoViML commented 4 years ago

Try it now. It has been fixed for a while. Make sure you uninstall old versions by doing: pip uninstall autoviml

You can reinstall by: pip install autoviml —no-cache-dir —ignore-installed

Let me know. All the best.

dsbyprateekg commented 4 years ago

I have followed the steps and after running notebook I am getting following error- TypeError: Categorical is not ordered for operation min you can use .as_ordered() to change the Categorical to an ordered one

Complete log is as below- `--------------------------------------------------------------------------- TypeError Traceback (most recent call last)

in 7 Add_Poly=0, Stacking_Flag=False, 8 Imbalanced_Flag=False, ----> 9 verbose=1) D:\Projects\Auto_ViML-master\autoviml\Auto_ViML.py in Auto_ViML(train, target, test, sample_submission, hyper_param, feature_reduction, scoring_parameter, Boosting_Flag, KMeans_Featurizer, Add_Poly, Stacking_Flag, Binning_Flag, Imbalanced_Flag, verbose) 523 numvars.append(col) 524 ### for all numeric variables, fill missing values with 1 less than min. --> 525 fill_num = start_train[col].min() - 1 526 if start_train[col].isnull().sum() > 0: 527 start_train[col] = start_train[col].fillna(fill_num) c:\users\prateek.g\appdata\local\continuum\anaconda3\envs\mynewenv\lib\site-packages\pandas\core\generic.py in stat_func(self, axis, skipna, level, numeric_only, **kwargs) 11616 return self._agg_by_level(name, axis=axis, level=level, skipna=skipna) 11617 return self._reduce( > 11618 f, name, axis=axis, skipna=skipna, numeric_only=numeric_only 11619 ) 11620 c:\users\prateek.g\appdata\local\continuum\anaconda3\envs\mynewenv\lib\site-packages\pandas\core\series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds) 4069 # TODO deprecate numeric_only argument for Categorical and use 4070 # skipna as well, see GH25303 -> 4071 return delegate._reduce(name, numeric_only=numeric_only, **kwds) 4072 elif isinstance(delegate, ExtensionArray): 4073 # dispatch to ExtensionArray interface c:\users\prateek.g\appdata\local\continuum\anaconda3\envs\mynewenv\lib\site-packages\pandas\core\arrays\categorical.py in _reduce(self, name, axis, **kwargs) 2259 msg = "Categorical cannot perform the operation {op}" 2260 raise TypeError(msg.format(op=name)) -> 2261 return func(**kwargs) 2262 2263 def min(self, numeric_only=None, **kwargs): c:\users\prateek.g\appdata\local\continuum\anaconda3\envs\mynewenv\lib\site-packages\pandas\core\arrays\categorical.py in min(self, numeric_only, **kwargs) 2276 min : the minimum of this `Categorical` 2277 """ -> 2278 self.check_for_ordered("min") 2279 if numeric_only: 2280 good = self._codes != -1 c:\users\prateek.g\appdata\local\continuum\anaconda3\envs\mynewenv\lib\site-packages\pandas\core\arrays\categorical.py in check_for_ordered(self, op) 1584 "Categorical is not ordered for operation {op}\n" 1585 "you can use .as_ordered() to change the " -> 1586 "Categorical to an ordered one\n".format(op=op) 1587 ) 1588 TypeError: Categorical is not ordered for operation min you can use .as_ordered() to change the Categorical to an ordered one`
AutoViML commented 4 years ago

your install command is wrong - it must be: pip install autoviml --no-cache-dir --ignore-installed

(note the double dashes) one again "pip uninstall autoviml" and then reinstall using above command. then it should work Ram

On Mon, Jan 6, 2020 at 10:58 PM Prateek Gupta notifications@github.com wrote:

[image: uninstall] https://user-images.githubusercontent.com/30830541/71867294-a3d39d80-312f-11ea-81d4-457ecea061c8.JPG I have uninstalled and tried to reinstall it using your given command but getting error, attached is the screen shot of the error.

Then I reinstall it using pip install autoviml and ran the notebook but getting following error- ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields Name

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/AutoViML/Auto_ViML/issues/3?email_source=notifications&email_token=AMKBH6HDZQZA3EY72Y3O7TTQ4P4XRA5CNFSM4JTUZBZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIHTRGY#issuecomment-571422875, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMKBH6ETE6V3UHJIAL7ZR5LQ4P4XRANCNFSM4JTUZBZA .

dsbyprateekg commented 4 years ago

Hi Ram, Although I did the same yet getting the error.

Naseer5543 commented 4 years ago

@dsbyprateekg , I have uninstalled & installed the latest pack. It is working fine now on Spyder. @rsesha , Thanks a lot Mr. Ram for fix.

dsbyprateekg commented 4 years ago

It seems issue is with Jupyter+Windows only because I tried multiple times but getting same error.

rsesha commented 4 years ago

Prateek: You might want to just go to: Colab.research.google.com and use it from there. Another option is to show your error to someone who is a Python expert as well as knows a bit about Windows shell commands. I am pretty sure these two can fix it for you. I am closing the issue for now. Thanks for letting me know Ram

On Wed, Jan 8, 2020 at 7:21 AM Prateek Gupta notifications@github.com wrote:

It seems issue is with Jupyter+Windows only because I tried multiple times but getting same error.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/AutoViML/Auto_ViML/issues/3?email_source=notifications&email_token=AGEUZ7GJZOPSCCZFZJNXTDLQ4XAOJA5CNFSM4JTUZBZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIMGQGQ#issuecomment-572024858, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGEUZ7H3BQ5ND2P2AXRJTS3Q4XAOJANCNFSM4JTUZBZA .

dsbyprateekg commented 4 years ago

Yes Ram, you can close this issue since issue is not reproducible by others.

AutoViML commented 4 years ago

Yes this issue is now fixed and closed.