"Singular matrix" error when I use normal distribution or negative-binomial

hrkadkhodaei commented 2 years ago

When I run the following code snipper I get an error "numpy.linalg.LinAlgError: Singular matrix" X_train, y_train, X_test, y_test = read_data(InEx) model = XGBDistribution(distribution="normal", n_estimators=500) model.fit(X_train, y_train, eval_set=[(X_test, y_test)], early_stopping_rounds=10)

The full error:

`D:\Python37\lib\site-packages\xgboost_distribution\distributions\normal.py:89: RuntimeWarning: overflow encountered in exp D:\Python37\lib\site-packages\xgboost_distribution\distributions\normal.py:61: RuntimeWarning: overflow encountered in exp Traceback (most recent call last): File "D:\Python37\lib\contextlib.py", line 130, in exit self.gen.throw(type, value, traceback) File "D:\Python37\lib\site-packages\xgboost\config.py", line 140, in config_context yield File "D:\Python37\lib\site-packages\xgboost_distribution\model.py", line 181, in fit callbacks=callbacks, File "D:\Python37\lib\site-packages\xgboost\training.py", line 196, in train early_stopping_rounds=early_stopping_rounds) File "D:\Python37\lib\site-packages\xgboost\training.py", line 81, in _train_internal bst.update(dtrain, i, obj) File "D:\Python37\lib\site-packages\xgboost\core.py", line 1685, in update grad, hess = fobj(pred, dtrain) File "D:\Python37\lib\site-packages\xgboost_distribution\model.py", line 254, in obj y=y, params=params, natural_gradient=self.natural_gradient File "D:\Python37\lib\site-packages\xgboost_distribution\distributions\normal.py", line 72, in gradient_and_hessian grad = np.linalg.solve(fisher_matrix, grad) File "<__array_function__ internals>", line 6, in solve File "D:\Python37\lib\site-packages\numpy\linalg\linalg.py", line 394, in solve r = gufunc(a, b, signature=signature, extobj=extobj) File "D:\Python37\lib\site-packages\numpy\linalg\linalg.py", line 88, in _raise_linalgerror_singular raise LinAlgError("Singular matrix") numpy.linalg.LinAlgError: Singular matrix

Process finished with exit code 1 ` The training and test data contain 13 float features (X) and 1 integer target (y)

aleksaw commented 1 year ago

I get the same error as above when fitting a 430k rows dataset with 31 columns, but the same dataset scaled down to 43k rows works.

CyperStone commented 1 year ago

I got the same error. I tried to set the parameter natural_gradient as False to omit line grad = np.linalg.solve(fisher_matrix, grad) that is causing this, but then appears this warning:

C:\Users\scyperski\Anaconda3\envs\cost_prediction\lib\site-packages\xgboost_distribution\distributions\normal.py:65: RuntimeWarning: divide by zero encountered in divide grad[:, 0] = (loc - y) / var C:\Users\scyperski\Anaconda3\envs\cost_prediction\lib\site-packages\xgboost_distribution\distributions\normal.py:66: RuntimeWarning: divide by zero encountered in divide grad[:, 1] = 1 - ((y - loc) ** 2) / var C:\Users\scyperski\Anaconda3\envs\cost_prediction\lib\site-packages\xgboost_distribution\distributions\normal.py:78: RuntimeWarning: divide by zero encountered in divide hess[:, 0] = 1 / var C:\Users\scyperski\Anaconda3\envs\cost_prediction\lib\site-packages\xgboost_distribution\distributions\normal.py:79: RuntimeWarning: divide by zero encountered in divide hess[:, 1] = 2 * ((y - loc) ** 2) / var

As a result the predictions are full of NaNs. It seems like it's caused by log_scale array (in gradient_and_hessian method) which elements are to small and rounded to 0 after: var = np.exp(2 * log_scale)

As a workaround I added this line before calculating the exponential: log_scale = np.clip(log_scale, -20, 20)

So far it works even with natural_gradient parameter as True.

CDonnerer commented 1 year ago

Hi, Thanks for raising / debugging. Does anyone have an example data set / method of fitting where this happens?

CyperStone commented 1 year ago

I got the permission from my workplace to share a sample dataset after its anonymization. I also prepared a minimal code snippet to reproduce this problem. Please contact me at szymoncyperski@gmail.com (dataset is quite heavy).

CDonnerer commented 1 year ago

Thanks, appreciate this. I've got a slight preference for finding a public dataset, just so it's easier to add to the test suite, so I'll have look at this first and get back to you if I can't reproduce.

CDonnerer commented 1 year ago

Okay, I was able to reproduce the error with some datasets and merged a fix (#86) which is available in the latest release (xgboost-distribution==0.2.7). However, depending on the data, there could still be issues here, so please let me know if this error still occurs.

jackguac commented 4 months ago

Still got the same issue with negative-binomial. If this is still being maintained, let me know and I'll get an MRE together.

jackguac commented 4 months ago

I've similarly found that the size of the dataset makes a difference. Up to about 40k rows is fine, above that the error occurs. It doesn't seem related to the contents of the dataset (e.g. for a 1M row dataset, all the 40k chunks are independently fine, but passed in together cause the error)

CDonnerer commented 4 months ago

Yes, it is still maintained. Do you have any details on the error that you're seeing (or data for reproducible example)? The above was related to numeric overflow errors, so if that's the issue, it may just need safer limits for negative-binomial.

CDonnerer / xgboost-distribution

"Singular matrix" error when I use normal distribution or negative-binomial #64