Closed juansecal closed 6 months ago
Could you please give a self-contained, reproducible example of the problem? If that's not possible, then please at least show us the full stack trace with the error.
I would expect your code snippet to error with
File ~/glum/src/glum/_glm.py:2815, in GeneralizedLinearRegressor._validate_hyperparameters(self)
2803 raise ValueError(
2804 "Penalty term must be a non-negative number;"
2805 " got (alpha={})".format(self.alpha)
2806 )
2808 if (
2809 not np.isscalar(self.l1_ratio)
2810 # check for numeric, i.e. not a string
(...)
2813 or self.l1_ratio > 1
2814 ):
-> 2815 raise ValueError(
2816 "l1_ratio must be a number in interval [0, 1];"
2817 " got (l1_ratio={})".format(self.l1_ratio)
2818 )
2819 super()._validate_hyperparameters()
ValueError: l1_ratio must be a number in interval [0, 1]; got (l1_ratio=1.5)
Here's an example on how to fit a Tweedie model (with alpha_search=True
) using the data shown in the README:
from sklearn.datasets import fetch_openml
from glum import GeneralizedLinearRegressor
from glum import TweedieDistribution
# This dataset contains house sale prices for King County, which includes
# Seattle. It includes homes sold between May 2014 and May 2015.
house_data = fetch_openml(name="house_sales", version=3, as_frame=True)
X = house_data.data[
[
"bedrooms",
"bathrooms",
"sqft_living",
"floors",
"waterfront",
"view",
"condition",
"grade",
"yr_built",
"yr_renovated",
]
]
y = house_data.target
model = GeneralizedLinearRegressor(
family=TweedieDistribution(1.5),
alpha_search=True,
l1_ratio=0.5,
fit_intercept=True,
max_iter=200,
)
model.fit(X=X, y=y)
Sure, the data is just losses and exposure, classic GLM fitting, no NA values or Inf. 3% frequency
Coordinate descent did not converge. You might want to increase the number of iterations. Minimum norm subgradient: nan, tolerance: nan
newcoef, gap, , _, n_cycles = enet_coordinate_descent_gram(
Traceback (most recent call last):
File "C:\Users\jcalderon\AppData\Local\JetBrains\PyCharm Community Edition 2023.2.1\plugins\python-ce\helpers\pydev\pydevconsole.py", line 364, in runcode
coro = func()
File "", line 1, in
Thanks! Based on that output alone, it's difficult for me to tell what's going wrong. Sorry! Also, as mentioned above, if you're really running this with l1_ratio=1.5
, I would expect you to hit a different error.
Are you using a private data set or are you testing this against, e.g., the publicly available "French Motor TPL Insurance Claims Data" (which we also use in our benchmark suite)?
The Tweedie regression gives error ValueError: array must not contain infs or NaNs
There are no infinate or Nan values