ebi-gene-expression-group / scanpy-scripts

Scripts for using scanpy
Apache License 2.0
30 stars 13 forks source link

Scrublet failing on certain dataset #134

Open anilthanki opened 4 months ago

anilthanki commented 4 months ago

Scanpy Scrublet (v1.9.3+galaxy0) is failing on certain dataset with the error shown below. It works successfully if Number of principal components (--n-pcs) are reduced.

Running Scrublet
filtered out 6821 genes that are detected in less than 3 cells
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:01)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
normalizing counts per cell
    finished (0:00:00)
normalizing counts per cell
    finished (0:00:00)
Embedding transcriptomes using PCA...
filtered out 4409 genes that are detected in less than 3 cells
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:00)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
normalizing counts per cell
    finished (0:00:00)
normalizing counts per cell
    finished (0:00:00)
Embedding transcriptomes using PCA...
filtered out 6393 genes that are detected in less than 3 cells
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:00)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
normalizing counts per cell
    finished (0:00:00)
normalizing counts per cell
    finished (0:00:00)
Embedding transcriptomes using PCA...
filtered out 6520 genes that are detected in less than 3 cells
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:00)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
normalizing counts per cell
    finished (0:00:00)
normalizing counts per cell
    finished (0:00:00)
Embedding transcriptomes using PCA...
filtered out 7453 genes that are detected in less than 3 cells
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:00)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
normalizing counts per cell
    finished (0:00:00)
normalizing counts per cell
    finished (0:00:00)
Embedding transcriptomes using PCA...
filtered out 8535 genes that are detected in less than 3 cells
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:00)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
normalizing counts per cell
    finished (0:00:00)
normalizing counts per cell
    finished (0:00:00)
Embedding transcriptomes using PCA...
filtered out 18975 genes that are detected in less than 3 cells
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:00)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
normalizing counts per cell
    finished (0:00:00)
normalizing counts per cell
    finished (0:00:00)
Embedding transcriptomes using PCA...
Traceback (most recent call last):
  File "/usr/local/bin/scanpy-cli", line 10, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/scanpy_scripts/cmd_utils.py", line 49, in cmd
    func(adata, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/scanpy_scripts/lib/_scrublet.py", line 26, in scrublet
    sce.pp.scrublet(adata, adata_sim=adata_sim, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/scanpy/external/pp/_scrublet.py", line 239, in scrublet
    scrubbed = [
  File "/usr/local/lib/python3.9/site-packages/scanpy/external/pp/_scrublet.py", line 240, in <listcomp>
    _run_scrublet(
  File "/usr/local/lib/python3.9/site-packages/scanpy/external/pp/_scrublet.py", line 210, in _run_scrublet
    ad_obs = _scrublet_call_doublets(
  File "/usr/local/lib/python3.9/site-packages/scanpy/external/pp/_scrublet.py", line 439, in _scrublet_call_doublets
    sl.pipeline_pca(
  File "/usr/local/lib/python3.9/site-packages/scrublet/helper_functions.py", line 91, in pipeline_pca
    pca = PCA(n_components=n_prin_comps, random_state=random_state, svd_solver=svd_solver).fit(X_obs)
  File "/usr/local/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 435, in fit
    self._fit(X)
  File "/usr/local/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 514, in _fit
    return self._fit_truncated(X, n_components, self._fit_svd_solver)
  File "/usr/local/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 587, in _fit_truncated
    raise ValueError(
ValueError: n_components=30 must be between 1 and min(n_samples, n_features)=21 with svd_solver='arpack'