how to compute psquared in GLM estimator

glm-tools / pyglmnet

Python implementation of elastic-net regularized generalized linear models

http://glm-tools.github.io/pyglmnet/

MIT License

279 stars 83 forks source link

how to compute psquared in GLM estimator #304

Closed jpainam closed 5 years ago

jpainam commented 5 years ago

Hi, I'm comparing many estimator and i want to get the psquared for GLM estimator using Negative Binomiale, Gaussian and Poisson. I'm able to get the psquared for the other models, but not for GLM. Please, kindly help.

jasmainak commented 5 years ago

hi sorry I don't follow your question. Can you try to explain a bit more your question and maybe illustrate with code? Thanks.

jpainam commented 5 years ago

Thank you, i mean. How to compute pseudo R^2 for GLM estimators? how can I get the value of pseudo R^2 after running GLM estimator on my dataset

jpainam commented 5 years ago

@jasmainak Hi, here is a sample code. How do I get the pseudo R^2

import numpy as np
import scipy.sparse as sps
from sklearn.preprocessing import StandardScaler
from pyglmnet import GLM

glm = GLM(distr='poisson')
n_samples, n_features = 10000, 100
beta0 = np.random.normal(0.0, 1.0, 1)
beta = sps.rand(n_features, 1, 0.1)
beta = np.array(beta.todense())

Xtrain = np.random.normal(0.0, 1.0, [n_samples, n_features])
ytrain = glm.simulate(beta0, beta, Xtrain)

Xtest = np.random.normal(0.0, 1.0, [n_samples, n_features])
ytest = glm.simulate(beta0, beta, Xtest)
# fit the model on the training data
scaler = StandardScaler().fit(Xtrain)
glm.fit(scaler.transform(Xtrain), ytrain)

yhat = glm.predict(scaler.transform(Xtest))

deviance = glm.score(scaler.transform(Xtest), ytest)

jasmainak commented 5 years ago

you need to instantiate your GLM object with the scoring metric.

glm = GLM(distr='poisson', score_metric='pseudo_R2')

jpainam commented 5 years ago

Thank you, this is the result when i used score_metric='pseudo_R2'

[0.38726042 0.62702618 0.74175937 0.79341947 0.82873337 0.83989782
              0.84622304 0.85146645 0.8528544  0.85402913]

I was expecting a single value. Which one is the pseudo R2 value?

jpainam commented 5 years ago

I also get this error when i used my own set of data. Since this issue is resolved, i will close it and open a new issue with the problem i experienced with my own data Thank you again

C:\Users\Paul\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\uti
ls\validation.py:475: DataConversionWarning: Data with input dtype object was conv
erted to float64 by StandardScaler.
  warnings.warn(msg, DataConversionWarning)
Traceback (most recent call last):                                      File "gpyg
lmnet.py", line 26, in <module>                               glm.fit(scaler.trans
form(Xtrain), ytrain)
  File "Cxxxx\lib\site-packages\pyglmnet\pyglmnet.py", line 634, in fit
    beta[0], beta[1:], rl, X, y)  File "xxxxxxxx\Python36\lib\site-packages\pyg
lmnet\pyglmnet.py", line 389, in _grad_L2loss                 X[selector, :]))
TypeError: ufunc 'add' output (typecode 'O') could not be coerced to provided outp
ut parameter (typecode 'd') according to the casting rule ''same_kind''

jasmainak commented 5 years ago

pseudo R2 value

are you running this on the regularization path? that's why you probably have 10 scores.