bbalasub1 / glmnet_python

GNU General Public License v3.0
199 stars 93 forks source link

Following cvglmnet documentation for foldid causes ValueError: You need to supply a value for newx #39

Closed namheegordonkim closed 5 years ago

namheegordonkim commented 5 years ago

The iteration for folds in cvelnet is 0-based.

https://github.com/bbalasub1/glmnet_python/blob/7f78608c877d9dc8d0456ad63df72c655cb21221/glmnet_python/cvelnet.py#L37

The documentation for cvglmnet implies 1-based.

https://github.com/bbalasub1/glmnet_python/blob/7f78608c877d9dc8d0456ad63df72c655cb21221/glmnet_python/cvglmnet.py#L41

namheegordonkim commented 5 years ago

I thought I was being pretty clear. The documentation in cvglmnet.py implies that foldid should range from 1 to nfold, but the for-loop in cvelnet.py is clearly ranging from 0 to nfolds-1.

namheegordonkim commented 5 years ago

I don't think we agree on what foldid is supposed to do. From the documentation:

 foldid      an optional vector of values between 1 and nfold identifying
             what fold each observation is in. If supplied, nfold can be
             missing.

foldid is supposed to be a vector of integers assigning a fold to each example in X, and this vector is used to create the training fold and the validation fold.

From cvelnet.py:

for i in range(nfolds):
        which = foldid == i
        fitobj = fit[i].copy()
        fitobj['offset'] = False
        preds = glmnetPredict(fitobj, x[which, ])
        nlami = scipy.size(fit[i]['lambdau'])
        predmat[which, 0:nlami] = preds

which is a boolean vector obtained from comparing foldid to integers ranging from 0 to nfolds-1, but from the documentation, foldid is not supposed to contain any 0.

bbalasub1 commented 5 years ago

You are correct. This is an artifact from R. I will fix the doc to reflect this. Thank you for pointing this out.

On Saturday, March 9, 2019, Nam Hee Kim notifications@github.com wrote:

I don't think we agree on what foldid is supposed to do. From the documentation:

foldid an optional vector of values between 1 and nfold identifying what fold each observation is in. If supplied, nfold can be missing.

foldid is supposed to be a vector of integers assigning a fold to each example in X, and this fold is used to create the training fold and the validation fold.

From cvelnet.py:

for i in range(nfolds): which = foldid == i fitobj = fit[i].copy() fitobj['offset'] = False preds = glmnetPredict(fitobj, x[which, ]) nlami = scipy.size(fit[i]['lambdau']) predmat[which, 0:nlami] = preds

which is a boolean vector obtained from comparing foldid to integers ranging from 0 to nfolds-1, but from the documentation, foldid is not supposed to contain any 0.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/bbalasub1/glmnet_python/issues/39#issuecomment-471233106, or mute the thread https://github.com/notifications/unsubscribe-auth/AKpDE5nUuGpChjzmv_wPZOtsUEb1XTp8ks5vVEmLgaJpZM4bm0BI .