ThomasWolf0701 commented 3 years ago

Describe the bug When running on GPU Tabnet crashes with scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device)) RuntimeError: CUDA error: device-side assert triggered

What is the current behavior? It works when the matrix I use contains only integers but fails with floats. I also made sure that NaN values are imputed and there are no Inf. Also the largest value fits into float32, Also set the batch size to a very low level.

If the current behavior is a bug, please provide the steps to reproduce. tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10)

Expected behavior

Screenshots

tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),batch_size = 10) Traceback (most recent call last):

File "", line 1, in tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),batch_size = 10)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py", line 329, in fit fit_params_steps = self._check_fit_params(**fit_params)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py", line 248, in _check_fit_params "=sample_weight)`.".format(pname))

ValueError: Pipeline.fit does not accept the batch_size parameter. You can pass parameters to specific steps of your pipeline using the stepnameparameter format, e.g. `Pipeline.fit(X, y, logisticregressionsample_weight=sample_weight)`.

tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10) No early stopping will be performed, last training weights will be used. Traceback (most recent call last):

File "", line 1, in tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py", line 335, in fit self._final_estimator.fit(Xt, y, **fit_params_last_step)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py", line 173, in fit self._train_epoch(train_dataloader)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py", line 349, in _train_epoch batch_logs = self._train_batch(X, y)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py", line 384, in _train_batch output, M_loss = self.network(X)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py", line 276, in forward return self.tabnet(x)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py", line 151, in forward out = self.feat_transformersstep

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py", line 375, in forward x = self.shared(x)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py", line 409, in forward scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device))

RuntimeError: CUDA error: device-side assert triggered

Other relevant information: poetry version:
python version: Operating System: Additional tools:

Additional context

Optimox commented 3 years ago

Hello,

This line ValueError: Pipeline.fit does not accept the batch_size parameter. You can pass parameters to specific steps of your pipeline using the stepname__parameter format, e.g. Pipeline.fit(X, y, logisticregression__sample_weight=sample_weight) makes me think that you are using tabnet inside a sklearn pipeline.

Tabnet is not compatible with all sklearn pipeline, I guess that's the problem. Could you share the code you are running with TabNet?

ThomasWolf0701 commented 3 years ago

Here it is:

imputer = SimpleImputer(missing_values=np.nan,strategy='mean') scorer = make_scorer(mean_squared_error, greater_is_better= False)

inner_cv = TimeSeriesSplit(n_splits=5)#.split(featureMatrix) outer_cv = PredefinedHoldoutSplit(valid_indices=[range(0,100,1)]

set the training parameters for Random Forest

paramsTab = { 'mn_steps': randint(1,3), 'm__n_a': randint(8,64), 'mn_d': randint(8,64), 'mgamma': uniform(1, 1), 'mn_shared': randint(1, 5), 'm__n_independent': randint(1, 5), 'mmomentum': loguniform(0.01, 0.4), "mmask_type":["sparsemax", "entmax"] }

tab_model = TabNetRegressor(device_name = "cuda")

tab_model = Pipeline(steps=[('i', imputer),('m', tab_model)])

tab_search = RandomizedSearchCV(tab_model,scoring = scorer ,param_distributions=paramsTab, random_state=42, cv=inner_cv, verbose=5, n_jobs=1, return_train_score=True)

tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10) tab_search.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values))

ThomasWolf0701 commented 3 years ago

Also tried without batch_size and now i get: tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values))

tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values)) No early stopping will be performed, last training weights will be used. Traceback (most recent call last):

File "", line 1, in tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values))