dylkot / cNMF

Code and example data for running Consensus Non-negative Matrix Factorization on single-cell RNA-Seq data
MIT License
265 stars 57 forks source link

Potential fix for issue #8, #10, #57 and #66 #71

Closed lucas-diedrich closed 4 months ago

lucas-diedrich commented 1 year ago

Thanks for creating this great package! I ran into the same issue as the authors of issue #8, #10, #57 and #66, in which running the consensus step on the factors raises the following TypeError in the imported module sklearn.decomposition.NMF

TypeError: H should have the same dtype as X. Got H.dtype = float64

I found that this issue arises from the fact that the anndata package with version<0.9 implicitly converts the dtype of its anndata.AnnData.X attribute to np.float32, which seems to be incompatible with sklearn.decomposition.NMF , when the original dtype is np.float64. This unexpected conversion was removed for anndata versions 0.9 and higher.

I therefore suggest to install cNMF in an environment with anndata>=0.9. As a caveat, this requires a minimal python version of >=3.8 or higher (instead of the python version 3.7 recommended in the installation instructions of cNMF). Still, this appears to be more convenient and more reproducible compared to changing the source code of external packages as suggested previously.

Attached are jupyter notebooks and conda environment specifications with a minimal example based on the PBMC example exhibiting the described behavior.

dtype \ Environment anndata v. == 0.9.2/python v.==3.11.3 anndata v. = 0.8.0/python v.== 3.7.16
anndata.AnnData.X.dtype == np.float32 Test run - Success Test run - Success
anndata.AnnData.adata.X.dtype == np.float64 Test run - Success Test run - Fails

cnmf-combine-TypeError-reproducibility.zip

dylkot commented 9 months ago

Thank you!

dylkot commented 4 months ago

Adjusted in the latest development branch and will push to master and pypi shortly. Thanks again