ebi-gene-expression-group / scanpy-scripts

Scripts for using scanpy
Apache License 2.0
30 stars 13 forks source link

Error where n_comps > n_cells #77

Closed pinin4fjords closed 4 years ago

pinin4fjords commented 4 years ago

With v0.2.9, supplying a matrix with fewer cells than the n_comps setting produces an error. In the following example, input.h5 has 35 cells:

scanpy-run-pca --n-comps '50' --no-zero-center --svd-solver 'arpack' --random-state '1234'  --input-format 'anndata' input.h5   --show-obj stdout --output-format 'anndata' output.h5

And an error is produced like:

/path/to/gxa_galaxy/_conda/envs/__scanpy-scripts@0.2.9/lib/python3.6/site-packages/anndata/base.py:17: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
Traceback (most recent call last):
  File "/path/to/gxa_galaxy/_conda/envs/__scanpy-scripts@0.2.9/bin/scanpy-run-pca", line 10, in <module>
    sys.exit(PCA_CMD())
  File "/path/to/gxa_galaxy/_conda/envs/__scanpy-scripts@0.2.9/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/path/to/gxa_galaxy/_conda/envs/__scanpy-scripts@0.2.9/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/path/to/gxa_galaxy/_conda/envs/__scanpy-scripts@0.2.9/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/path/to/gxa_galaxy/_conda/envs/__scanpy-scripts@0.2.9/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/path/to/gxa_galaxy/_conda/envs/__scanpy-scripts@0.2.9/lib/python3.6/site-packages/scanpy_scripts/cmd_utils.py", line 43, in cmd
    func(adata, **kwargs)
  File "/path/to/gxa_galaxy/_conda/envs/__scanpy-scripts@0.2.9/lib/python3.6/site-packages/scanpy_scripts/lib/_pca.py", line 27, in pca
    sc.pp.pca(adata, **kwargs)
  File "/path/to/gxa_galaxy/_conda/envs/__scanpy-scripts@0.2.9/lib/python3.6/site-packages/scanpy/preprocessing/_simple.py", line 512, in pca
    adata.varm['PCs'][adata.var['highly_variable']] = pca_.components_.T
ValueError: shape mismatch: value array of shape (2157,35) could not be broadcast to indexing result of shape (2157,50)

I understand the provision of 50 is non-sensical, but it's a default in the workflow we have, and I'd like that not to break things in cases such as this.

@nh3 - would you be amenable to an analagous PR to https://github.com/ebi-gene-expression-group/scanpy-scripts/pull/76 (which covered a the related issue of low gene numbers), which automatically bumps the n_comps down to the minimum of (default, n_cells, [ n_genes ? ] ).

nh3 commented 4 years ago

yeah, a PR of that kind is welcome. Have you figured out why you need even smaller n_comps in #76?

pinin4fjords commented 4 years ago

Thanks @nh3 , okay, will do. I've been away for the intervening two weeks, so haven't figured out the other thing. But issues with low cell numbers are a bigger issue for us right now than low gene numbers.