Alex-Lekov / AutoML_Alex

State-of-the art Automated Machine Learning python library for Tabular Data
MIT License
225 stars 42 forks source link

Error during training #18

Closed atercygnus closed 3 years ago

atercygnus commented 3 years ago

I'm trying to use automl_alex to make a baseline for this task, but this triggered an error.

my code is:

import pandas as pd
from automl_alex import AutoMLRegressor

X_train = df[df.columns.difference(['y'], sort=False)]
y_train = df.y
X_test = pd.read_csv('./data/test.csv', index_col='ID')

model = AutoMLRegressor(X_train, y_train, X_test, cat_features=X_train.columns, verbose=1)

%%time
predict_test, predict_train = model.fit_predict(verbose=2)

Step 1: Model 0


100%|██████████| 1/1 [00:14<00:00, 14.74s/it]

Model 1 One iteration takes ~ 4.5 sec Start Auto calibration parameters [I 2020-11-05 11:35:57,966] A new study created in memory with name: no-name-ecd42a72-44ac-442e-9b71-d621e2a383ea Start optimization with the parameters: CV_Folds = 5 Score_CV_Folds = 2 Feature_Selection = True Opt_lvl = 2 Cold_start = 44.0 Early_stoping = 100 Metric = mean_squared_error Direction = minimize ################################################## Default model OptScore = 109.2303 Optimize: : 55it [20:47, 22.68s/it, | Model: ExtraTrees | OptScore: 105.3311 | Best mean_squared_error: 88.854 +- 16.477105]

Predict from Models_1 100%|██████████| 3/3 [01:27<00:00, 29.22s/it] 0%| | 0/1 [00:00<?, ?it/s]

Calc predict policy on Models_1: | posible_repeats: 0 | stack_top: 1 | n_repeats: 1 100%|██████████| 1/1 [02:28<00:00, 148.44s/it]

Mean Score mean_squared_error on 5 Folds: 76.0092 std: 15.236319

Models_1 Mean mean_squared_error Score Train: 76.0097

Model 2

One iteration takes ~ 10.6 sec

Start Auto calibration parameters Start optimization with the parameters: CV_Folds = 5 Score_CV_Folds = 1 Feature_Selection = True Opt_lvl = 1 Cold_start = 10 Early_stoping = 50 Metric = mean_squared_error Direction = minimize ################################################## Default model OptScore = 75.9577 Optimize: : 16it [02:44, 24.17s/it, | Model: MLP | OptScore: 109.1348 | Best mean_squared_error: 109.1348 ]

stack trace is: /home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/extmath.py:153: RuntimeWarning: overflow encountered in matmul ret = a @ b /home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/extmath.py:153: RuntimeWarning: invalid value encountered in matmul ret = a @ b Trial 16 failed because of the following error: ValueError("Input contains NaN, infinity or a value too large for dtype('float64').") Traceback (most recent call last): File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/optuna/study.py", line 799, in _run_trial result = func(trial) File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/automl_alex/models/base.py", line 420, in objective data_kwargs, File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/automl_alex/models/base.py", line 762, in cross_val_score res = self.cross_val(predict=False,kwargs) File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/automl_alex/models/base.py", line 700, in cross_val y_test=val_y.reset_index(drop=True), File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/automl_alex/models/sklearn_models.py", line 88, in _fit model.model.fit(X_train, y_train,) File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 641, in fit return self._fit(X, y, incremental=False) File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 371, in _fit intercept_grads, layer_units, incremental) File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 554, in _fit_stochastic self._update_no_improvement_count(early_stopping, X_val, y_val) File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 597, in _update_no_improvement_count self.validationscores.append(self.score(X_val, y_val)) File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/base.py", line 552, in score return r2_score(y, y_pred, sample_weight=sample_weight) File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f return f(kwargs) File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/metrics/_regression.py", line 589, in r2_score y_true, y_pred, multioutput) File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/metrics/_regression.py", line 86, in _check_reg_targets y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype) File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f return f(kwargs) File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 645, in check_array allow_nan=force_all_finite == 'allow-nan') File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 99, in _assert_all_finite msg_dtype if msg_dtype is not None else X.dtype) ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

python: 3.7.4 ubuntu: Ubuntu 18.04.5 LTS packages installed: alembic==1.4.3 argon2-cffi==20.1.0 async-generator==1.10 attrs==20.2.0 automl-alex==0.10.7 backcall==0.2.0 bleach==3.2.1 catboost==0.24.2 category-encoders==2.2.2 certifi==2020.6.20 cffi==1.14.3 chardet==3.0.4 cliff==3.4.0 cmaes==0.7.0 cmd2==1.3.11 colorama==0.4.4 colorlog==4.4.0 cycler==0.10.0 decorator==4.4.2 defusedxml==0.6.0 entrypoints==0.3 graphviz==0.14.2 idna==2.10 importlib-metadata==2.0.0 ipykernel==5.3.4 ipython==7.19.0 ipython-genutils==0.2.0 jedi==0.17.2 Jinja2==2.11.2 joblib==0.17.0 json5==0.9.5 jsonschema==3.2.0 jupyter-client==6.1.7 jupyter-core==4.6.3 jupyterlab==2.2.9 jupyterlab-pygments==0.1.2 jupyterlab-server==1.2.0 kiwisolver==1.3.1 lightgbm==3.0.0 Mako==1.1.3 MarkupSafe==1.1.1 matplotlib==3.3.2 mistune==0.8.4 nbclient==0.5.1 nbconvert==6.0.7 nbformat==5.0.8 nest-asyncio==1.4.2 notebook==6.1.4 numpy==1.19.4 optuna==2.2.0 packaging==20.4 pandas==1.1.4 pandocfilters==1.4.3 parso==0.7.1 patsy==0.5.1 pbr==5.5.1 pexpect==4.8.0 pickleshare==0.7.5 Pillow==8.0.1 plotly==4.12.0 prettytable==0.7.2 prometheus-client==0.8.0 prompt-toolkit==3.0.8 ptyprocess==0.6.0 pycparser==2.20 Pygments==2.7.2 pyparsing==2.4.7 pyperclip==1.8.1 pyrsistent==0.17.3 python-dateutil==2.8.1 python-editor==1.0.4 pytz==2020.4 PyYAML==5.3.1 pyzmq==19.0.2 requests==2.24.0 retrying==1.3.3 scikit-learn==0.23.2 scipy==1.5.3 seaborn==0.11.0 Send2Trash==1.5.0 six==1.15.0 SQLAlchemy==1.3.20 statsmodels==0.12.1 stevedore==3.2.2 terminado==0.9.1 testpath==0.4.4 threadpoolctl==2.1.0 tornado==6.1 tqdm==4.51.0 traitlets==5.0.5 urllib3==1.25.11 wcwidth==0.2.5 webencodings==0.5.1 xgboost==1.2.1 zipp==3.4.0

Alex-Lekov commented 3 years ago

can you send a link to the dataset or the kaggle notebook itself?

Alex-Lekov commented 3 years ago

I can't reproduce the error without the data itself and the environment.

Alex-Lekov commented 3 years ago

fix in v0.11.24 #19