elephaint / pgbm

Probabilistic Gradient Boosting Machines
Apache License 2.0
141 stars 20 forks source link

Error when entering data containing NaN #21

Closed tji5otma closed 1 year ago

tji5otma commented 1 year ago

I am interested in probabilistic GBM and was doing some research and found your PGBM repository. So I read the README and according to the Feature overview, pgbm.torch.PGBM is compatible with NaN. However, when I tried the following code, I got ValueError: Input X contains NaN.

from pgbm.torch import PGBMRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)

# Add nans
nans = np.zeros([X_train.shape[0],1])
nans[:,:] = np.nan
X_train = np.append(X_train, nans, axis=1)

model = PGBMRegressor().fit(X_train, y_train)  

This is a slight modification of your sample code. If you know anything about this problem, please let me know.

elephaint commented 1 year ago

Hi,

You are right, the sklearn wrapper for the Torch version slipped through the unit nan-input tests because of a small mistake on my behalf. This was a simple fix and has been fixed now. Reinstalling PGBM from pip should solve the issue (make sure the version installed is 2.1.1). Please re-open if not fixed.