AutoViML / Auto_ViML

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Apache License 2.0
526 stars 102 forks source link

AttributeError: module 'numpy' has no attribute 'bool'. #41

Open GDGauravDutta opened 6 months ago

GDGauravDutta commented 6 months ago

m, feats, trainm, testm = Auto_ViML(train, target='Exited',test=test, sample_submission=sol, KMeans_Featurizer=False, hyper_param='RS',feature_reduction=True, Boosting_Flag='CatBoost', Binning_Flag=False, Add_Poly=0, Stacking_Flag=False,Imbalanced_Flag=False, GPU_flag=False, verbose=2)

==================== System Information ==================== System: Linux Node Name: 852e2a997c18 Release: 5.15.133+ Version: #1 SMP Tue Dec 19 13:14:11 UTC 2023 Machine: x86_64 Processor: x86_64 ====================== CPU Information ====================== Physical cores: 2 Total cores: 4 ==================== Memory Information ==================== Total: 31.36GB Available: 29.87GB Used: 1.05GB ############## D A T A S E T A N A L Y S I S ####################### Training Set Shape = (15000, 11) Training Set Memory Usage = 0.36 MB Test Set Shape = (10000, 10) Test Set Memory Usage = 0.23 MB Single_Label Target: ['Exited'] Random shuffling the data set before training Using RandomizedSearchCV for Hyper Parameter Tuning. This is 3X faster than GridSearchCV... Class -> Counts -> Percent 0: 11937 -> 79.6% 1: 3063 -> 20.4% Target Exited is already numeric. No transformation done. ####################################################################################### ######################## C L A S S I F Y I N G V A R I A B L E S #################### ####################################################################################### Classifying variables in data set... Printing upto 30 columns max in each category: Numeric Columns : ['Balance', 'EstimatedSalary'] Integer-Categorical Columns: ['CreditScore', 'Age', 'Tenure', 'NumOfProducts'] String-Categorical Columns: ['Geography'] Factor-Categorical Columns: [] String-Boolean Columns: [] Numeric-Boolean Columns: ['Gender', 'HasCrCard', 'IsActiveMember'] Discrete String Columns: [] NLP text Columns: [] Date Time Columns: [] ID Columns: [] Columns that will not be considered in modeling: [] 10 Predictors classified... Data Set Shape: 15000 rows, 10 cols Additional details on columns:

/opt/conda/lib/python3.10/site-packages/autoviml/sulov_method.py in ?(df, preds_in, modeltype, target, corr_limit, verbose, dask_xgboost_flag) 111 print('#######################################################################################') 112 ### This is a shorter version of getting unduplicated and highly correlated vars ## --> 113 correlation_dataframe = df.corr().abs().unstack().sort_values().drop_duplicates() 114 corrdf = pd.DataFrame(correlation_dataframe[:].reset_index())

/opt/conda/lib/python3.10/site-packages/pandas/core/frame.py in ?(self, level, fill_value, sort) 9926 DataFrame 9927 A dataframe containing columns from both the caller and other. -> 9928 9929 See Also

TypeError: unstack() takes from 2 to 3 positional arguments but 4 were given

During handling of the above exception, another exception occurred:

AttributeError Traceback (most recent call last) Cell In[36], line 1 ----> 1 m, feats, trainm, testm = Auto_ViML(train, target='Exited',test=test, 2 sample_submission=sol, 3 KMeans_Featurizer=False, 4 hyper_param='RS',feature_reduction=True, 5 Boosting_Flag='CatBoost', Binning_Flag=False, 6 Add_Poly=0, Stacking_Flag=False,Imbalanced_Flag=False, 7 GPU_flag=False, verbose=2)

File /opt/conda/lib/python3.10/site-packages/autoviml/Auto_ViML.py:1339, in Auto_ViML(train, target, test, sample_submission, hyper_param, feature_reduction, scoring_parameter, Boosting_Flag, KMeans_Featurizer, Add_Poly, Stacking_Flag, Binning_Flag, Imbalanced_Flag, GPU_flag, verbose) 1333 train_sel = FE_remove_variables_using_SULOV_method(train, red_preds, 1334 modeltype, each_target, 1335 corr_limit, verbose) 1336 except: 1337 #### if for some reason, the above blows up due to memory error, then try this 1338 #### Dropping highly correlated Features fast using simple linear correlation ### -> 1339 remove_list = remove_highly_correlated_vars_fast(train[num_vars], corr_limit) 1340 train_sel = left_subtract(num_vars, remove_list) 1341 num_vars = train[train_sel].select_dtypes(include=[np.float64, np.float32, np.float16]).columns.tolist()

File /opt/conda/lib/python3.10/site-packages/autoviml/sulov_method.py:37, in remove_highly_correlated_vars_fast(df, corr_limit) 34 cor_matrix = df.corr().abs().astype(np.float16) 35 # Selecting upper triangle of correlation matrix 36 upper_tri = cor_matrix.where(np.triu(np.ones(cor_matrix.shape), ---> 37 k=1).astype(np.bool)) 38 # Finding index of feature columns with correlation greater than 0.95 39 to_drop = [column for column in upper_tri.columns if any(upper_tri[column] > corr_limit)]

File /opt/conda/lib/python3.10/site-packages/numpy/init.py:324, in getattr(attr) 319 warnings.warn( 320 f"In the future np.{attr} will be defined as the " 321 "corresponding NumPy scalar.", FutureWarning, stacklevel=2) 323 if attr in former_attrs: --> 324 raise AttributeError(former_attrs[attr]) 326 if attr == 'testing': 327 import numpy.testing as testing

AttributeError: module 'numpy' has no attribute 'bool'. np.bool was a deprecated alias for the builtin bool. To avoid this error in existing code, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here. The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations