Closed notuntoward closed 4 years ago
This looks like a doc bug; the README seems to be a leftover from an old pyglmnet version in which simulate() wasn't a GLM object method. I changed the example code to call the simulate() method and it seems to work.
Is other documentation up to date enough that I could try pyglmnet on some experiments?
Actually this is a mismatch between the documentation of the stable version and the latest release. Sorry for this issue. We are going to make a release soon so the README should be accurate for the next release. Note that simulate_glm is the new way of doing things rather than a method.
Thanks for the quick reply. Should I wait for your new release or is what's on pip already usable?
I think you can already use the development version. It has some bug fixes too.
Just do:
$ pip install https://api.github.com/repos/glm-tools/pyglmnet/zipball/master
I used the new pip command to install pyglmnet-1.0.1.dev0 but the script in the README still fails:
File "C:\Users\Scott\Anaconda3\lib\site-packages\pyglmnet\pyglmnet.py", line 718, in fit
raise ValueError(msg)
ValueError: X must be a 2D ndarray, and y must be 1D
This is because the README script created a 1D numpy ndarray like so:
y_train = simulate_glm("poisson", beta0, beta, X_train)
but glm.fit() tests for a one dimensional y like this:
if X.ndim != 2 or y.ndim != 1:
The check doesn't work because:
y_train.shape == (1000, 1)
y_train.ndim == 2
To me, a (1000,1) array should be considered 1D. But if the rest of pyglmnet consistently assumes that 1D arrays always have shape == (n_samples, ), then I guess that's what simulate_glm()
should return.
The simplest way is to force the shapes of simulate_glm()'s
return values. For example:
return np.squeeze(_lmb(distr, beta0, beta, X, eta))
and
return np.squeeze(y)
sorry for the slow response.
I would not use squeeze
because it's dangerous. You would not be able to distinguish between shape of (1, 1000) and (1000, 1). However, I'm okay with a more careful check like the one in scikit-learn. Would you like to take a stab at this? Perhaps @tommyod can also help since he introduced these checks.
I'll take a look at it.
(n, )
and never (n, 1)
. But sklearn typically wants (n, 1)
and not (n, )
- do we really want to go against their API here? As a further comment on this, I don't see any disadvantage of getting closer to the sklearn API, adding it as a dependency and using for instance their error checking capabilities "for free". I'm all for minimizing dependencies in general, but a person doing GLM is likely testing other models and has sklearn installed. Having a unified API would also make their life easier. Thoughts on points 1) and 2) above @jasmainak ?
do we really want to go against their API here?
nopes, let's try to be sklearn-like as much as possible. It will help with adoption.
I'm all for minimizing dependencies in general, but a person doing GLM is likely testing other models and has sklearn installed
yes, I guess we could do this. But there isn't a strong coupling between the two projects at the moment to warrant this. Plus sklearn's list of dependencies are growing by the day. To be clear, I am not against this, just not sure if there is enough evidence that we need this at the moment.
Why the long release cycle in general?
No reason and we should make quicker releases. Any help in achieving this would be welcome :) We changed quite a bit the API after 1.0 to make it more sklearn-like which took time. Other than that, I agree shorter release cycles would be great!
The example code in the main README fails on the 3rd line:
This is after installing pyglmnet with pip.