OHDSI / bayes-bridge

Bayesian sparse regression with regularized shrinkage and conjugate gradient acceleration
https://bayes-bridge.readthedocs.io/en/latest/
19 stars 7 forks source link

Change len() statements to .shape[0] for arrays in logistic model #14

Closed chinandrew closed 3 years ago

chinandrew commented 3 years ago

len() doesn't work with scipy sparse matrices (it raises TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0]), so changing to use shape[0].

This just affects the logistic_model class for now. I think some of the other models may hit this issue as well, but I don't have a great sense of where they're potentially using sparse matrices or not. I can also just convert every len() in the model/ folder if that's safe to do.

aki-nishimura commented 3 years ago

Hm, I always intended n_success and n_trial to be numpy vectors and am not even sure if the Gibbs sampler would run correctly if they are scipy sparse matrices. (I document their types at a higher-level in RegressionModel but was lazy to document at lower levels.)

Anyways, I totally agree that the use of len() is an obstacle in trying out cupy and is a plain bad style. I find using .shape[0] using a one-dimensional array a bit confusing, so maybe .size?

chinandrew commented 3 years ago

Hm, I always intended n_success and n_trial to be numpy vectors and am not even sure if the Gibbs sampler would run correctly if they are scipy sparse matrices. (I document their types at a higher-level in RegressionModel but was lazy to document at lower levels.)

Anyways, I totally agree that the use of len() is an obstacle in trying out cupy and is a plain bad style. I find using .shape[0] using a one-dimensional array a bit confusing, so maybe .size?

Oh I see what I did. So I didn't actually hit this when running cupy tests, but rather when trying to use the gi_bleed data on the master branch, since y, X = dm.read_ohdsi_data() makes y into a sparse matrix, and len(y) fails. What I missed was the line y = np.squeeze(y.toarray()) that came right after that in the notebook that converts it to a numpy array. No wonder I was hitting so many issues :facepalm: . I'll close this for now; we can make changes for gpu support as needed.

aki-nishimura commented 3 years ago

y, X = dm.read_ohdsi_data() makes y into a sparse matrix, and len(y) fails. What I missed was the line y = np.squeeze(y.toarray()) that came right after that in the notebook that converts it to a numpy array

Ah, sorry you had to suffer from my crappy code. I can't remember, but I don't see any good reason why dm.read_ohdsi_data() should return a sparse matrix.