bvaldebenitom / SoloTE

GNU General Public License v3.0
23 stars 6 forks source link

Scanpy error when using sc.read_10x_mtx() #22

Closed wbrett87 closed 1 year ago

wbrett87 commented 1 year ago

Hello!

Thanks so much for the awesome work you've done! I finished running the pipeline and now I am trying to load the data into an anndata file but I get the following error. Any help would be greatly appreciated!!

Thanks


KeyError Traceback (most recent call last) File ~/mambaforge/envs/scanpy/lib/python3.10/site-packages/pandas/core/indexes/base.py:3652, in Index.get_loc(self, key) 3651 try: -> 3652 return self._engine.get_loc(casted_key) 3653 except KeyError as err:

File ~/mambaforge/envs/scanpy/lib/python3.10/site-packages/pandas/_libs/index.pyx:147, in pandas._libs.index.IndexEngine.get_loc()

File ~/mambaforge/envs/scanpy/lib/python3.10/site-packages/pandas/_libs/index.pyx:176, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:2606, in pandas._libs.hashtable.Int64HashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:2630, in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 2

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last) Cell In [13], line 1 ----> 1 adata = sc.read_10x_mtx(".")

File ~/mambaforge/envs/scanpy/lib/python3.10/site-packages/scanpy/readwrite.py:490, in read_10x_mtx(path, var_names, make_unique, cache, cache_compression, gex_only, prefix) 488 genefile_exists = (path / f'{prefix}genes.tsv').is_file() 489 read = _read_legacy_10x_mtx if genefile_exists else _read_v3_10x_mtx --> 490 adata = read( 491 str(path), 492 var_names=var_names, 493 make_unique=make_unique, 494 cache=cache, 495 cache_compression=cache_compression, 496 prefix=prefix, 497 ) 498 if genefile_exists or not gex_only: 499 return adata

File ~/mambaforge/envs/scanpy/lib/python3.10/site-packages/scanpy/readwrite.py:571, in _read_v3_10x_mtx(path, var_names, make_unique, cache, cache_compression, prefix) 569 else: 570 raise ValueError("var_names needs to be 'gene_symbols' or 'gene_ids'") --> 571 adata.var['feature_types'] = genes[2].values 572 adata.obs_names = pd.read_csv(path / f'{prefix}barcodes.tsv.gz', header=None)[ 573 0 574 ].values 575 return adata

File ~/mambaforge/envs/scanpy/lib/python3.10/site-packages/pandas/core/frame.py:3761, in DataFrame.getitem(self, key) 3759 if self.columns.nlevels > 1: 3760 return self._getitem_multilevel(key) -> 3761 indexer = self.columns.get_loc(key) 3762 if is_integer(indexer): 3763 indexer = [indexer]

File ~/mambaforge/envs/scanpy/lib/python3.10/site-packages/pandas/core/indexes/base.py:3654, in Index.get_loc(self, key) 3652 return self._engine.get_loc(casted_key) 3653 except KeyError as err: -> 3654 raise KeyError(key) from err 3655 except TypeError: 3656 # If we have a listlike key, _check_indexing_error will raise 3657 # InvalidIndexError. Otherwise we fall through and re-raise 3658 # the TypeError. 3659 self._check_indexing_error(key)

KeyError: 2

bvaldebenitom commented 1 year ago

Hi @wbrett87!

What version of scanpy are you using?

wbrett87 commented 1 year ago

scanpy==1.9.3 anndata==0.9.1 umap==0.5.3 numpy==1.24.3 scipy==1.10.1 pandas==2.0.2 scikit-learn==1.2.2 statsmodels==0.14.0 python-igraph==0.10.3 pynndescent==0.5.10

bvaldebenitom commented 1 year ago

I tried the following to get the output loaded into Scanpy. Please run the following commands, and let me know if it works:

  1. First, fix the features.tsv file:

    cp features.tsv features.tsv.backup
    awk 'BEGIN{FS=OFS="\t"}{print $1,$2,"Gene Expression"}' features.tsv.backup > features.tsv
  2. Compress files with gzip

    gzip features.tsv
    gzip matrix.mtx
    gzip barcodes.tsv
wbrett87 commented 1 year ago

Works like a charm! Thanks so much!