Closed m-haines closed 7 months ago
Thanks for reporting this. I can reproduce this when setting the option use_gp_model_for_validation=False
in the gpb.cv()
or grid_search_tune_parameters()
function. A fix for this is on GitHub and will be on PyPI soon.
Note that it is not recommended to use use_gp_model_for_validation=False
. Rather use use_gp_model_for_validation=True
, since you will likely also use the gp_model part for making predictions.
Thank you for the information and advice on use_gp_model_for_validation=True, I have been using that, as I do use the gp_model part for predictions.
I should have mentioned this before, but are you able to reproduce it when trying to initiate a standalone GPModel class, with the negative_binomial option? As given below? As that is when the error occurs for me.
gp_model = gpb.GPModel(gp_coords=coords_train.transpose(), cov_function="exponential",
likelihood="negative_binomial, gp_approx="vecchia")
Although from what you have said, I think the fix should resolve the issue. I look forward to trying it soon.
I can only reproduce this error when I set use_gp_model_for_validation=False
and e.g. metric="mse"
. Also, the corresponding code in regression_objective.hpp should only be called when use_gp_model_for_validation=False
. Note that you are missing a quotation mark in your code. It should be:
gp_model = gpb.GPModel(gp_coords=coords_train.transpose(), cov_function="exponential",
likelihood="negative_binomial", gp_approx="vecchia")
I just realized that your error message [GPBoost] [Fatal] Likelihood of type 'negative_binomial' is not supported.
is not from he file regression_objective.hpp. There it would say. [GPBoost] [Fatal] ConvertOutput: Likelihood of type 'negative_binomial' is not supported.
That leaves me a little puzzled as the reason must be something else. Can you provide a reproducible example including data?
I can, yes. The example below reproduces the error on my machine. It uses the house_sales data from geodatasets. I have put it together quickly, so the negative binomial might be a poor choice of likelihood for the dataset.
import geopandas # version 0.14.2
import geodatasets # version 2023.12.0
import gpboost as gpb # version 1.2.7.1
import numpy as np # version 1.26.3
home_sales = geopandas.read_file(geodatasets.get_path("geoda.home_sales"))
home_sales_coords = home_sales.get_coordinates()
home_sales["x"] = home_sales_coords["x"]
home_sales["y"] = home_sales_coords["y"]
# Remove duplicate coord values from home_sales so non-Gaussian likelihood can be fitted by GPBoost,
# keeping the more expensive house at that location
home_sales = home_sales.sort_values(by=['price'])
home_sales = home_sales.drop_duplicates(subset=["x", "y"], keep='last')
home_sales = home_sales.sort_index()
home_sales = home_sales.reset_index(drop=True)
coords_train = np.array([home_sales["x"].to_numpy(), home_sales["y"].to_numpy()])
gp_model = gpb.GPModel(gp_coords=coords_train.transpose(), cov_function="exponential",
likelihood="negative_binomial", gp_approx="vecchia")
x_train = home_sales[["bedrooms", "bathrooms", "sqft_liv", "sqft_lot", "floors", "view"]]
y_train = home_sales["price"]
data_train = gpb.Dataset(x_train, y_train)
params = { 'lambda_l2': 1, 'learning_rate': 0.01,
'max_depth': 3, 'min_data_in_leaf': 20,
'num_leaves': 2**10, 'verbose': 0}
mod = gpb.train(params=params, train_set=data_train,
gp_model=gp_model, num_boost_round=247)
If it helps, using Github search, the only other place I can find which mentions a "Likelihood of type" error is ./include/GPBoost/likelihoods.h, line 86 and 187, an example for context is below:
Likelihood(string_t type,
data_size_t num_data,
data_size_t num_re,
bool has_a_vec,
bool use_Z_for_duplicates,
const data_size_t* random_effects_indices_of_data) {
string_t likelihood = ParseLikelihoodAlias(type);
likelihood = ParseLikelihoodAliasGradientDescent(likelihood);
if (SUPPORTED_LIKELIHOODS_.find(likelihood) == SUPPORTED_LIKELIHOODS_.end()) {
Log::REFatal("Likelihood of type '%s' is not supported.", likelihood.c_str());
}
However, it looks as though "negative_binomial" is in the list of SUPPORTED_LIKELIHOODS, so I didn't mention it yesterday as I didn't think that was the issue.
Version 1.2.7.1 does not yet support the negative binomial likelihood. This is a relatively new feature and, unfortunately, I have not released any updates on PyPI for some time. As of today, version 1.3.0 is on PyPI which supports the negative binomial likelihood.
Apologies for the slow reply, I was away for a few days. All is working now, thank you for deploying 1.30 to PyPi.
Further from an earlier issue I posted (which was user error), I thought I would try out a few of the other likelihoods for a simple toy model, just so that I know what I am doing in the future. The code is roughly, as before, except the response variable, y, is converted to the int type prior to this code block:
However, the negative_binomial gives the following error:
I think this might be because in
line 212 ish, likelihood type "negative_binomial" is not listed, which leads to the error message given.
Is this intentional? Although once again, it could be user error or a lack of understanding on my part.
Thank you once again for your help.