ebi-gene-expression-group / scanpy-scripts

Scripts for using scanpy
Apache License 2.0
30 stars 13 forks source link

Scrublet double prediction fails with cells with under 5 expressed genes #125

Open pcm32 opened 8 months ago

pcm32 commented 8 months ago

Just in case you got an error like this with the scrubblet call:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.venv/lib/python3.10/site-packages/scanpy/external/pp/_scrublet.py", line 252, in scrublet
    adata.obs = scrubbed_obs.loc[adata.obs_names.values]
  File "/.venv/lib/python3.10/site-packages/pandas/core/indexing.py", line 1103, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/.venv/lib/python3.10/site-packages/pandas/core/indexing.py", line 1332, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/.venv/lib/python3.10/site-packages/pandas/core/indexing.py", line 1272, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis)
  File "/.venv/lib/python3.10/site-packages/pandas/core/indexing.py", line 1462, in _get_listlike_indexer
    keyarr, indexer = ax._get_indexer_strict(key, axis_name)
  File "/.venv/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 5876, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/.venv/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 5938, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")
KeyError: "['ACCGTAAAGGGTATCG-1', 'ACGATGTAGACAAGCC-1', 'AGATTGCCATATACGC-1', 'CTCGTACGTAGCTCCG-1', 'CTCGTCAGTAACGCGA-1', 'GCTCCTATCACCTTAT-1', 'GTAGTCACAATGAAAC-1', 'GTTCGGGGTCTTGCGG-1', 'TACTTACTCCCAAGTA-1', 'TAGTTGGCATGCGCAC-1', 'TCATTACAGCTCAACT-1', 'CACACTCAGTCAAGCG-1', 'ACATGGTTCAGCTTAG-1', 'ATCCGAAAGGTGTGGT-1', 'CCTACACTCGATAGAA-1', 'CGAACATAGGGATCTG-1', 'CGTGAGCCAATGAATG-1', 'GGAAAGCAGACAGGCT-1', 'GGGATGAAGATGAGAG-1', 'GGGATGACATACGCTA-1', 'TAAGTGCAGCTGAACG-1', 'TATGCCCGTGCTTCTC-1', 'TCATTTGCATAGGATA-1', 'TCCCGATCAGGGTATG-1', 'TTTGGTTAGGCTAGGT-1', 'GCATGATTCACCTTAT-1', 'CACACAAAGTATCGAA-1'] not in index"

this is due to those cells having less than 5 genes expressed, a cell filtering by number of genes resolves this (worked with 5).