getting ValueError when running notebook with XGBoost on Titanic dataset.

dsbyprateekg commented 5 years ago

Hi,

Thanks for sharing your work! I just tested the titanic dataset downloaded from https://www.kaggle.com/c/titanic/data with XGBoost as below- m, feats, trainm, testm = Auto_ViML(train, target, test, sample_submission, scoring_parameter=scoring_parameter, hyper_param='GS',feature_reduction=True, Boosting_Flag=True,Binning_Flag=False, Add_Poly=0, Stacking_Flag=False, Imbalanced_Flag=False, verbose=1)

Once I ran the above code then found below error- ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields Name

It seems same error occurs in case of Boosting_Flag=None. Logs of the console just prior to error is as below-

Train (Size: 891,12) has Single_Label with target: ['Survived'] " ################### Binary-Class ##################### " Shuffling the data set before training Class -> Counts -> Percent 1: 342 -> 38.4% 0: 549 -> 61.6% Selecting 2-Class Classifier... Using GridSearchCV for Hyper Parameter tuning... Target Survived is already numeric. No transformation done. Top columns in Train with missing values: ['Cabin', 'Age', 'Embarked'] and their missing value totals: [687, 177, 2] Classifying variables in data set... Number of Numeric Columns = 2 Number of Integer-Categorical Columns = 3 Number of String-Categorical Columns = 1 Number of Factor-Categorical Columns = 0 Number of String-Boolean Columns = 1 Number of Numeric-Boolean Columns = 0 Number of Discrete String Columns = 2 Number of NLP String Columns = 0 Number of Date Time Columns = 0 Number of ID Columns = 2 Number of Columns to Delete = 0 11 Predictors classified... This does not include the Target column(s) 2 variables removed since they were some ID or low-information variables Completed Label Encoding, Missing Value Imputing and Scaling of data without errors. No Missing values in Train Test data has no missing values Number of numeric variables = 5 No variables were removed since no highly correlated variables found in data

Data Ready for Modeling with Target variable = Survived Starting Selection among 11 predictors... Number of numeric variables = 5 No variables were removed since no highly correlated variables found in data Adding 6 categorical variables to reduced numeric variables of 5 Selected No. of variables = 11 Finding Important Features... in 11 variables

AutoViML commented 5 years ago

Thanks Prateek. Sorry there was a typo that had caused the error. It has now been fixed. Please do:

pip install autoviml --upgrade

that should fix the bug. Please try it and let me know. Ram Ram

On Mon, Dec 2, 2019 at 7:06 AM Prateek Gupta notifications@github.com wrote:

Hi,

Thanks for sharing your work! I just tested the titanic dataset downloaded from https://www.kaggle.com/c/titanic/data with XGBoost as below- m, feats, trainm, testm = Auto_ViML(train, target, test, sample_submission, scoring_parameter=scoring_parameter, hyper_param='GS',feature_reduction=True, Boosting_Flag=True,Binning_Flag=False, Add_Poly=0, Stacking_Flag=False, Imbalanced_Flag=False, verbose=1)

Once I ran the above code then found below error- ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields Name

Logs of the console just prior to error is as below-

Train (Size: 891,12) has Single_Label with target: ['Survived'] " ################### Binary-Class ##################### " Shuffling the data set before training Class -> Counts -> Percent 1: 342 -> 38.4% 0: 549 -> 61.6% Selecting 2-Class Classifier... Using GridSearchCV for Hyper Parameter tuning... Target Survived is already numeric. No transformation done. Top columns in Train with missing values: ['Cabin', 'Age', 'Embarked'] and their missing value totals: [687, 177, 2] Classifying variables in data set... Number of Numeric Columns = 2 Number of Integer-Categorical Columns = 3 Number of String-Categorical Columns = 1 Number of Factor-Categorical Columns = 0 Number of String-Boolean Columns = 1 Number of Numeric-Boolean Columns = 0 Number of Discrete String Columns = 2 Number of NLP String Columns = 0 Number of Date Time Columns = 0 Number of ID Columns = 2 Number of Columns to Delete = 0 11 Predictors classified... This does not include the Target column(s) 2 variables removed since they were some ID or low-information variables Completed Label Encoding, Missing Value Imputing and Scaling of data without errors. No Missing values in Train Test data has no missing values Number of numeric variables = 5 No variables were removed since no highly correlated variables found in data

Data Ready for Modeling with Target variable = Survived Starting Selection among 11 predictors... Number of numeric variables = 5 No variables were removed since no highly correlated variables found in data Adding 6 categorical variables to reduced numeric variables of 5 Selected No. of variables = 11 Finding Important Features... in 11 variables

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AutoViML/Auto_ViML/issues/3?email_source=notifications&email_token=AMKBH6DXK574S2WTPQ4K7V3QWT23VA5CNFSM4JTUZBZKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H5HW7CQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMKBH6B3MXLH3KN4ELXLIQLQWT23VANCNFSM4JTUZBZA .

dsbyprateekg commented 5 years ago

Hi Ram,

I have updated the library and restarted the notebook but getting same error.

deneshkumar commented 5 years ago

@AutoViML First of all, I would like to congratulate you for your great work. This AutoML library has a huge potential to disrupt the AutoML domain.

Getting back to the issue, I faced the same issue and I believe it is due to the fact that XGBoost Classifier does not auto-encode the categorical features like LightGBM. This need to corrected either by applying encoding of categorical values in "preds".

AutoViML commented 5 years ago

Can you please send me an email to the address that is on my GutHub page? I’d like to probe if this new update works since I tested it on the same dataset that I downloaded from Kaggle. Thanks Ram

On Tue, Dec 3, 2019 at 7:49 AM deneshkumar notifications@github.com wrote:

@AutoViML https://github.com/AutoViML First of all, I would like to congratulate you for your great work. This AutoML library has a huge potential to disrupt the AutoML domain.

Getting back to the issue, I faced the same issue and I believe it is due to the fact that XGBoost Classifier does not auto-encode the categorical features like LightGBM. This need to corrected either by applying encoding of categorical values in "preds".

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/AutoViML/Auto_ViML/issues/3?email_source=notifications&email_token=AMKBH6B6HZQ3R3N7BGFTLS3QWZIURA5CNFSM4JTUZBZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFZIETA#issuecomment-561152588, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMKBH6H7MRAGOZSSWI6BIB3QWZIURANCNFSM4JTUZBZA .

deneshkumar commented 5 years ago

@dsbyprateekg Try once with "Boosting_Flag" as False

Boosting_Flag as True with enable "CatBoost" model which does not include label encoding as of now.

AutoViML commented 5 years ago

Actually this should now be fixed Try this code on your Command shell for upgrading autoviml...

pip3 install --upgrade --ignore-installed --no-deps autoviml

AutoViML

dsbyprateekg commented 4 years ago

Hi Ram,

I have again updated the library, verified it's version and it is 1.0.45. But getting same error in notebook. Issue is not solved yet. version after_update

dsbyprateekg commented 4 years ago

No Ram, Issue is not solved yet. After updating again, I am getting same error.

On Fri, Dec 6, 2019 at 6:49 AM AutoViz and Auto_ViML < notifications@github.com> wrote:

Actually this should now be fixed Try this code on your Command shell for upgrading autoviml...

pip3 install --upgrade --ignore-installed --no-deps autoviml

AutoViML

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AutoViML/Auto_ViML/issues/3?email_source=notifications&email_token=AHLG7TJ5MU4RB3TE2A4WTX3QXGSA7A5CNFSM4JTUZBZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGCWJ5Y#issuecomment-562390263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHLG7TK47MMFV2MX3NYKITDQXGSA7ANCNFSM4JTUZBZA .

AutoViML commented 4 years ago

Try it now. It has been fixed for a while. Make sure you uninstall old versions by doing: pip uninstall autoviml

You can reinstall by: pip install autoviml —no-cache-dir —ignore-installed

Let me know. All the best.

dsbyprateekg commented 4 years ago

I have followed the steps and after running notebook I am getting following error- TypeError: Categorical is not ordered for operation min you can use .as_ordered() to change the Categorical to an ordered one

Complete log is as below- `--------------------------------------------------------------------------- TypeError Traceback (most recent call last)

in 7 Add_Poly=0, Stacking_Flag=False, 8 Imbalanced_Flag=False, ----> 9 verbose=1) D:\Projects\Auto_ViML-master\autoviml\Auto_ViML.py in Auto_ViML(train, target, test, sample_submission, hyper_param, feature_reduction, scoring_parameter, Boosting_Flag, KMeans_Featurizer, Add_Poly, Stacking_Flag, Binning_Flag, Imbalanced_Flag, verbose) 523 numvars.append(col) 524 ### for all numeric variables, fill missing values with 1 less than min. --> 525 fill_num = start_train[col].min() - 1 526 if start_train[col].isnull().sum() > 0: 527 start_train[col] = start_train[col].fillna(fill_num) c:\users\prateek.g\appdata\local\continuum\anaconda3\envs\mynewenv\lib\site-packages\pandas\core\generic.py in stat_func(self, axis, skipna, level, numeric_only, **kwargs) 11616 return self._agg_by_level(name, axis=axis, level=level, skipna=skipna) 11617 return self._reduce( > 11618 f, name, axis=axis, skipna=skipna, numeric_only=numeric_only 11619 ) 11620 c:\users\prateek.g\appdata\local\continuum\anaconda3\envs\mynewenv\lib\site-packages\pandas\core\series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds) 4069 # TODO deprecate numeric_only argument for Categorical and use 4070 # skipna as well, see GH25303 -> 4071 return delegate._reduce(name, numeric_only=numeric_only, **kwds) 4072 elif isinstance(delegate, ExtensionArray): 4073 # dispatch to ExtensionArray interface c:\users\prateek.g\appdata\local\continuum\anaconda3\envs\mynewenv\lib\site-packages\pandas\core\arrays\categorical.py in _reduce(self, name, axis, **kwargs) 2259 msg = "Categorical cannot perform the operation {op}" 2260 raise TypeError(msg.format(op=name)) -> 2261 return func(**kwargs) 2262 2263 def min(self, numeric_only=None, **kwargs): c:\users\prateek.g\appdata\local\continuum\anaconda3\envs\mynewenv\lib\site-packages\pandas\core\arrays\categorical.py in min(self, numeric_only, **kwargs) 2276 min : the minimum of this `Categorical` 2277 """ -> 2278 self.check_for_ordered("min") 2279 if numeric_only: 2280 good = self._codes != -1 c:\users\prateek.g\appdata\local\continuum\anaconda3\envs\mynewenv\lib\site-packages\pandas\core\arrays\categorical.py in check_for_ordered(self, op) 1584 "Categorical is not ordered for operation {op}\n" 1585 "you can use .as_ordered() to change the " -> 1586 "Categorical to an ordered one\n".format(op=op) 1587 ) 1588 TypeError: Categorical is not ordered for operation min you can use .as_ordered() to change the Categorical to an ordered one`

AutoViML commented 4 years ago

your install command is wrong - it must be: pip install autoviml --no-cache-dir --ignore-installed

(note the double dashes) one again "pip uninstall autoviml" and then reinstall using above command. then it should work Ram

On Mon, Jan 6, 2020 at 10:58 PM Prateek Gupta notifications@github.com wrote:

[image: uninstall] https://user-images.githubusercontent.com/30830541/71867294-a3d39d80-312f-11ea-81d4-457ecea061c8.JPG I have uninstalled and tried to reinstall it using your given command but getting error, attached is the screen shot of the error.

Then I reinstall it using pip install autoviml and ran the notebook but getting following error- ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields Name

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/AutoViML/Auto_ViML/issues/3?email_source=notifications&email_token=AMKBH6HDZQZA3EY72Y3O7TTQ4P4XRA5CNFSM4JTUZBZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIHTRGY#issuecomment-571422875, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMKBH6ETE6V3UHJIAL7ZR5LQ4P4XRANCNFSM4JTUZBZA .

dsbyprateekg commented 4 years ago

Hi Ram, Although I did the same yet getting the error.

Naseer5543 commented 4 years ago

@dsbyprateekg , I have uninstalled & installed the latest pack. It is working fine now on Spyder. @rsesha , Thanks a lot Mr. Ram for fix.

dsbyprateekg commented 4 years ago

It seems issue is with Jupyter+Windows only because I tried multiple times but getting same error.

rsesha commented 4 years ago

Prateek: You might want to just go to: Colab.research.google.com and use it from there. Another option is to show your error to someone who is a Python expert as well as knows a bit about Windows shell commands. I am pretty sure these two can fix it for you. I am closing the issue for now. Thanks for letting me know Ram

On Wed, Jan 8, 2020 at 7:21 AM Prateek Gupta notifications@github.com wrote:

It seems issue is with Jupyter+Windows only because I tried multiple times but getting same error.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/AutoViML/Auto_ViML/issues/3?email_source=notifications&email_token=AGEUZ7GJZOPSCCZFZJNXTDLQ4XAOJA5CNFSM4JTUZBZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIMGQGQ#issuecomment-572024858, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGEUZ7H3BQ5ND2P2AXRJTS3Q4XAOJANCNFSM4JTUZBZA .

dsbyprateekg commented 4 years ago

Yes Ram, you can close this issue since issue is not reproducible by others.

AutoViML commented 4 years ago

Yes this issue is now fixed and closed.

AutoViML / Auto_ViML

getting ValueError when running notebook with XGBoost on Titanic dataset. #3