==================== System Information ====================
System: Linux
Node Name: 852e2a997c18
Release: 5.15.133+
Version: #1 SMP Tue Dec 19 13:14:11 UTC 2023
Machine: x86_64
Processor: x86_64
====================== CPU Information ======================
Physical cores: 2
Total cores: 4
==================== Memory Information ====================
Total: 31.36GB
Available: 29.87GB
Used: 1.05GB
############## D A T A S E T A N A L Y S I S #######################
Training Set Shape = (15000, 11)
Training Set Memory Usage = 0.36 MB
Test Set Shape = (10000, 10)
Test Set Memory Usage = 0.23 MB
Single_Label Target: ['Exited']
Random shuffling the data set before training
Using RandomizedSearchCV for Hyper Parameter Tuning. This is 3X faster than GridSearchCV...
Class -> Counts -> Percent
0: 11937 -> 79.6%
1: 3063 -> 20.4%
Target Exited is already numeric. No transformation done.
#######################################################################################
######################## C L A S S I F Y I N G V A R I A B L E S ####################
#######################################################################################
Classifying variables in data set...
Printing upto 30 columns max in each category:
Numeric Columns : ['Balance', 'EstimatedSalary']
Integer-Categorical Columns: ['CreditScore', 'Age', 'Tenure', 'NumOfProducts']
String-Categorical Columns: ['Geography']
Factor-Categorical Columns: []
String-Boolean Columns: []
Numeric-Boolean Columns: ['Gender', 'HasCrCard', 'IsActiveMember']
Discrete String Columns: []
NLP text Columns: []
Date Time Columns: []
ID Columns: []
Columns that will not be considered in modeling: []
10 Predictors classified...
Data Set Shape: 15000 rows, 10 cols
Additional details on columns:
No variables removed since no ID or low-information variables found in data set
##############################################################################
D A T A P R E P A R A T I O N AND C L E A N I N G
##############################################################################
No Missing Values in train data set
Test data has no missing values. Continuing...
Completed Label Encoding and Filling of Missing Values for Train and Test Data
Binary_Classification problem: hyperparameters are being optimized for log_loss
#######################################################################################
SULOV: Searching for Uncorrelated List Of Variables in 2 features
TypeError Traceback (most recent call last)
/opt/conda/lib/python3.10/site-packages/autoviml/Auto_ViML.py in ?(train, target, test, sample_submission, hyper_param, feature_reduction, scoring_parameter, Boosting_Flag, KMeans_Featurizer, Add_Poly, Stacking_Flag, Binning_Flag, Imbalanced_Flag, GPU_flag, verbose)
1337 #### if for some reason, the above blows up due to memory error, then try this
1338 #### Dropping highly correlated Features fast using simple linear correlation ###
-> 1339 remove_list = remove_highly_correlated_vars_fast(train[num_vars], corr_limit)
1340 train_sel = left_subtract(num_vars, remove_list)
/opt/conda/lib/python3.10/site-packages/autoviml/sulov_method.py in ?(df, preds_in, modeltype, target, corr_limit, verbose, dask_xgboost_flag)
111 print('#######################################################################################')
112 ### This is a shorter version of getting unduplicated and highly correlated vars ##
--> 113 correlation_dataframe = df.corr().abs().unstack().sort_values().drop_duplicates()
114 corrdf = pd.DataFrame(correlation_dataframe[:].reset_index())
/opt/conda/lib/python3.10/site-packages/pandas/core/frame.py in ?(self, level, fill_value, sort)
9926 DataFrame
9927 A dataframe containing columns from both the caller and other.
-> 9928
9929 See Also
TypeError: unstack() takes from 2 to 3 positional arguments but 4 were given
During handling of the above exception, another exception occurred:
File /opt/conda/lib/python3.10/site-packages/autoviml/Auto_ViML.py:1339, in Auto_ViML(train, target, test, sample_submission, hyper_param, feature_reduction, scoring_parameter, Boosting_Flag, KMeans_Featurizer, Add_Poly, Stacking_Flag, Binning_Flag, Imbalanced_Flag, GPU_flag, verbose)
1333 train_sel = FE_remove_variables_using_SULOV_method(train, red_preds,
1334 modeltype, each_target,
1335 corr_limit, verbose)
1336 except:
1337 #### if for some reason, the above blows up due to memory error, then try this
1338 #### Dropping highly correlated Features fast using simple linear correlation ###
-> 1339 remove_list = remove_highly_correlated_vars_fast(train[num_vars], corr_limit)
1340 train_sel = left_subtract(num_vars, remove_list)
1341 num_vars = train[train_sel].select_dtypes(include=[np.float64, np.float32, np.float16]).columns.tolist()
File /opt/conda/lib/python3.10/site-packages/autoviml/sulov_method.py:37, in remove_highly_correlated_vars_fast(df, corr_limit)
34 cor_matrix = df.corr().abs().astype(np.float16)
35 # Selecting upper triangle of correlation matrix
36 upper_tri = cor_matrix.where(np.triu(np.ones(cor_matrix.shape),
---> 37 k=1).astype(np.bool))
38 # Finding index of feature columns with correlation greater than 0.95
39 to_drop = [column for column in upper_tri.columns if any(upper_tri[column] > corr_limit)]
File /opt/conda/lib/python3.10/site-packages/numpy/init.py:324, in getattr(attr)
319 warnings.warn(
320 f"In the future np.{attr} will be defined as the "
321 "corresponding NumPy scalar.", FutureWarning, stacklevel=2)
323 if attr in former_attrs:
--> 324 raise AttributeError(former_attrs[attr])
326 if attr == 'testing':
327 import numpy.testing as testing
AttributeError: module 'numpy' has no attribute 'bool'.
np.bool was a deprecated alias for the builtin bool. To avoid this error in existing code, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
m, feats, trainm, testm = Auto_ViML(train, target='Exited',test=test, sample_submission=sol, KMeans_Featurizer=False, hyper_param='RS',feature_reduction=True, Boosting_Flag='CatBoost', Binning_Flag=False, Add_Poly=0, Stacking_Flag=False,Imbalanced_Flag=False, GPU_flag=False, verbose=2)
==================== System Information ==================== System: Linux Node Name: 852e2a997c18 Release: 5.15.133+ Version: #1 SMP Tue Dec 19 13:14:11 UTC 2023 Machine: x86_64 Processor: x86_64 ====================== CPU Information ====================== Physical cores: 2 Total cores: 4 ==================== Memory Information ==================== Total: 31.36GB Available: 29.87GB Used: 1.05GB ############## D A T A S E T A N A L Y S I S ####################### Training Set Shape = (15000, 11) Training Set Memory Usage = 0.36 MB Test Set Shape = (10000, 10) Test Set Memory Usage = 0.23 MB Single_Label Target: ['Exited'] Random shuffling the data set before training Using RandomizedSearchCV for Hyper Parameter Tuning. This is 3X faster than GridSearchCV... Class -> Counts -> Percent 0: 11937 -> 79.6% 1: 3063 -> 20.4% Target Exited is already numeric. No transformation done. ####################################################################################### ######################## C L A S S I F Y I N G V A R I A B L E S #################### ####################################################################################### Classifying variables in data set... Printing upto 30 columns max in each category: Numeric Columns : ['Balance', 'EstimatedSalary'] Integer-Categorical Columns: ['CreditScore', 'Age', 'Tenure', 'NumOfProducts'] String-Categorical Columns: ['Geography'] Factor-Categorical Columns: [] String-Boolean Columns: [] Numeric-Boolean Columns: ['Gender', 'HasCrCard', 'IsActiveMember'] Discrete String Columns: [] NLP text Columns: [] Date Time Columns: [] ID Columns: [] Columns that will not be considered in modeling: [] 10 Predictors classified... Data Set Shape: 15000 rows, 10 cols Additional details on columns:
EstimatedSalary: 0 missing, 6106 uniques, most common: {84760.3203125: 30, 141872.046875: 25}
No variables removed since no ID or low-information variables found in data set ##############################################################################
D A T A P R E P A R A T I O N AND C L E A N I N G
############################################################################## No Missing Values in train data set Test data has no missing values. Continuing... Completed Label Encoding and Filling of Missing Values for Train and Test Data Binary_Classification problem: hyperparameters are being optimized for log_loss #######################################################################################
SULOV: Searching for Uncorrelated List Of Variables in 2 features
#######################################################################################
TypeError Traceback (most recent call last) /opt/conda/lib/python3.10/site-packages/autoviml/Auto_ViML.py in ?(train, target, test, sample_submission, hyper_param, feature_reduction, scoring_parameter, Boosting_Flag, KMeans_Featurizer, Add_Poly, Stacking_Flag, Binning_Flag, Imbalanced_Flag, GPU_flag, verbose) 1337 #### if for some reason, the above blows up due to memory error, then try this 1338 #### Dropping highly correlated Features fast using simple linear correlation ### -> 1339 remove_list = remove_highly_correlated_vars_fast(train[num_vars], corr_limit) 1340 train_sel = left_subtract(num_vars, remove_list)
/opt/conda/lib/python3.10/site-packages/autoviml/sulov_method.py in ?(df, preds_in, modeltype, target, corr_limit, verbose, dask_xgboost_flag) 111 print('#######################################################################################') 112 ### This is a shorter version of getting unduplicated and highly correlated vars ## --> 113 correlation_dataframe = df.corr().abs().unstack().sort_values().drop_duplicates() 114 corrdf = pd.DataFrame(correlation_dataframe[:].reset_index())
/opt/conda/lib/python3.10/site-packages/pandas/core/frame.py in ?(self, level, fill_value, sort) 9926 DataFrame 9927 A dataframe containing columns from both the caller and
other
. -> 9928 9929 See AlsoTypeError: unstack() takes from 2 to 3 positional arguments but 4 were given
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last) Cell In[36], line 1 ----> 1 m, feats, trainm, testm = Auto_ViML(train, target='Exited',test=test, 2 sample_submission=sol, 3 KMeans_Featurizer=False, 4 hyper_param='RS',feature_reduction=True, 5 Boosting_Flag='CatBoost', Binning_Flag=False, 6 Add_Poly=0, Stacking_Flag=False,Imbalanced_Flag=False, 7 GPU_flag=False, verbose=2)
File /opt/conda/lib/python3.10/site-packages/autoviml/Auto_ViML.py:1339, in Auto_ViML(train, target, test, sample_submission, hyper_param, feature_reduction, scoring_parameter, Boosting_Flag, KMeans_Featurizer, Add_Poly, Stacking_Flag, Binning_Flag, Imbalanced_Flag, GPU_flag, verbose) 1333 train_sel = FE_remove_variables_using_SULOV_method(train, red_preds, 1334 modeltype, each_target, 1335 corr_limit, verbose) 1336 except: 1337 #### if for some reason, the above blows up due to memory error, then try this 1338 #### Dropping highly correlated Features fast using simple linear correlation ### -> 1339 remove_list = remove_highly_correlated_vars_fast(train[num_vars], corr_limit) 1340 train_sel = left_subtract(num_vars, remove_list) 1341 num_vars = train[train_sel].select_dtypes(include=[np.float64, np.float32, np.float16]).columns.tolist()
File /opt/conda/lib/python3.10/site-packages/autoviml/sulov_method.py:37, in remove_highly_correlated_vars_fast(df, corr_limit) 34 cor_matrix = df.corr().abs().astype(np.float16) 35 # Selecting upper triangle of correlation matrix 36 upper_tri = cor_matrix.where(np.triu(np.ones(cor_matrix.shape), ---> 37 k=1).astype(np.bool)) 38 # Finding index of feature columns with correlation greater than 0.95 39 to_drop = [column for column in upper_tri.columns if any(upper_tri[column] > corr_limit)]
File /opt/conda/lib/python3.10/site-packages/numpy/init.py:324, in getattr(attr) 319 warnings.warn( 320 f"In the future
np.{attr}
will be defined as the " 321 "corresponding NumPy scalar.", FutureWarning, stacklevel=2) 323 if attr in former_attrs: --> 324 raise AttributeError(former_attrs[attr]) 326 if attr == 'testing': 327 import numpy.testing as testingAttributeError: module 'numpy' has no attribute 'bool'.
np.bool
was a deprecated alias for the builtinbool
. To avoid this error in existing code, usebool
by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, usenp.bool_
here. The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations