Closed mariosgeo closed 2 years ago
As the error message says, your number of monotone constraints does not correspond to the number of features in X_train. Dropping monotone_constraints
from params
will fix the issue. I.e., replace with
params = {'learning_rate': 0.1, 'objective': likelihood, 'verbose': 0}
Note: monotone_constraints
was included in the demo examples. I realize that this might create confusion and have removed it in the demo.
Note: using Gaussian processes on large data requires some approximation. Your current code will not finish in a reasonable amount of time. The go-to option for large data in GPBoost is a Vecchia approximation:
gp_model = gpb.GPModel(gp_coords=coords_train[:10000,:], cov_function="exponential",
likelihood=likelihood, vecchia_approx=True, num_neighbors=15)
Unfortunately, in the current implementation, the Vecchia approximation does not work nicely for non-Gaussian data including binary data. An alternative is to use a compactly supported covariance function such as tapered one or a Wendland covariance function. The latter can be done as follows:
gp_model = gpb.GPModel(gp_coords=coords_train[:10000,:], cov_function="wendland",
likelihood=likelihood, cov_fct_taper_range=10)
You might need to try different values for cov_fct_taper_range
, or better tune it. We are currently working on a better large data solution for non-Gaussian data. But at this point, I cannot give any indication when this will be released.
Concerning your second question. X_train and coords_train do not need to be different. They can be the same. The demo contains an example of spatial data where coords_train contain spatial coordinates and X_train contains other features / covariates. This is a typical situation in spatial statistics. However, in general, X_train and coords_train do not need to be different.
keep.csv Hello, I want to do a Combined tree-boosting and Gaussian process model on the data above.
I get the following error
[GPBoost] [Fatal] Check failed: (static_cast(traindata->num_total_features())) == (config->monotone_constraints.size()) at C:\Users\whsigris\Dropbox\HSLU\Projects\MixedBoost\GPBoost\python-package\compile\src\LightGBM\boosting\gbdt.cpp, line 55 .
2) A bonus question: Why X_train and coords_train has to be different? I understand in your example, you wanted to remove one corner. But in other type of data, can't I use the same?