ebi-gene-expression-group / scanpy-scripts

Scripts for using scanpy
Apache License 2.0
29 stars 13 forks source link

Scrubblet module fails when running with less than 30 cells #124

Open pcm32 opened 8 months ago

pcm32 commented 8 months ago

I guess this is not unexpected:

Traceback (most recent call last):
  File "/usr/local/bin/scanpy-multiplet", line 10, in <module>
    sys.exit(multiplet())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/scanpy_scripts/cmd_utils.py", line 46, in cmd
    func(adata, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/scanpy_scripts/lib/_scrublet.py", line 24, in scrublet
    sce.pp.scrublet(adata, adata_sim=adata_sim, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/scanpy/external/pp/_scrublet.py", line 239, in scrublet
    scrubbed = [
  File "/usr/local/lib/python3.9/site-packages/scanpy/external/pp/_scrublet.py", line 240, in <listcomp>
    _run_scrublet(
  File "/usr/local/lib/python3.9/site-packages/scanpy/external/pp/_scrublet.py", line 210, in _run_scrublet
    ad_obs = _scrublet_call_doublets(
  File "/usr/local/lib/python3.9/site-packages/scanpy/external/pp/_scrublet.py", line 434, in _scrublet_call_doublets
    sl.pipeline_pca(
  File "/usr/local/lib/python3.9/site-packages/scrublet/helper_functions.py", line 91, in pipeline_pca
    pca = PCA(n_components=n_prin_comps, random_state=random_state, svd_solver=svd_solver).fit(X_obs)
  File "/usr/local/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 382, in fit
    self._fit(X)
  File "/usr/local/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 459, in _fit
    return self._fit_truncated(X, n_components, self._fit_svd_solver)
  File "/usr/local/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 543, in _fit_truncated
    raise ValueError(
ValueError: n_components=30 must be between 1 and min(n_samples, n_features)=21 with svd_solver='arpack'

We should have some logic that either filters out samples with less than 30 cells before running scrubblet or something that allow the user to indicate that this should fail like this if getting to this point.