atarashansky / SAMap

SAMap: Mapping single-cell RNA sequencing datasets from evolutionarily distant organisms.
MIT License
64 stars 19 forks source link

"KeyError: 'dict'" with reproducible example #78

Closed silastittes closed 2 years ago

silastittes commented 2 years ago

Hi, I'm running into issues loading h5ad files to a SAM object. The full error message is:

KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/samap/lib/python3.7/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
    176         try:
--> 177             return func(elem, *args, **kwargs)
    178         except Exception as e:

~/anaconda3/envs/samap/lib/python3.7/site-packages/anndata/_io/h5ad.py in read_group(group)
    526     if encoding_type:
--> 527         EncodingVersions[encoding_type].check(
    528             group.name, group.attrs["encoding-version"]

~/anaconda3/envs/samap/lib/python3.7/enum.py in __getitem__(cls, name)
    356     def __getitem__(cls, name):
--> 357         return cls._member_map_[name]
    358 

KeyError: 'dict'

During handling of the above exception, another exception occurred:

AnnDataReadError                          Traceback (most recent call last)
~/tmp/ipykernel_45306/1419237555.py in <module>
      1 fn1 = "tutorial.h5ad"
      2 sam1 = SAM()
...
--> 184                     f"Above error raised while reading key {elem.name!r} of "
    185                     f"type {type(elem)} from {parent}."
    186                 )

AnnDataReadError: Above error raised while reading key '/layers' of type <class 'h5py._hl.group.Group'> from /.

The same error arises on my data of interest and the example data from scanpy.

I'm following the first few steps of this tutorial

(command line)

mkdir tut_data
wget http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz -O tut_data/pbmc3k_filtered_gene_bc_matrices.tar.gz
cd tut_data; tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz
mkdir write

(python)

tutdata = sc.read_10x_mtx(
    'tut_data/filtered_gene_bc_matrices/hg19/',  # the directory with the `.mtx` file
    var_names='gene_symbols',                # use gene symbols for the variable names (variables-axis index)
    cache=True) 
tutdata.write("tutorial.h5ad")

(python)

from samap.mapping import SAMAP
from samap.analysis import (get_mapping_scores, GenePairFinder, transfer_annotations,
                            sankey_plot, chord_plot, CellTypeTriangles, 
                            ParalogSubstitutions, FunctionalEnrichment,
                            convert_eggnog_to_homologs, GeneTriangles)
from samalg import SAM
import pandas as pd

The error occurs at this step, when I try to load in the newly created h5ad file.

(python)

fn1 = "tutorial.h5ad"
sam1 = SAM()
sam1.load_data(fn1)
silastittes commented 2 years ago

Sorry, forgot system info. Ubuntu 18.04.4 LTS. I'm using two separate conda environments because I couldn't get required versions to play nice. For the scanpy steps I'm bulding the env with the the following yaml

name: sc
channels:
  - conda-forge
  - anaconda
  - bioconda
dependencies:
  - ipykernel
  - scanpy
  - numpy

Resulting versions: scanpy==1.9.1 anndata==0.8.0 umap==0.5.3 numpy==1.22.4 scipy==1.8.1 pandas==1.4.2 scikit-learn==1.1.1 statsmodels==0.13.2 pynndescent==0.5.7

For SAMAP I followed steps in the readme using the following env

name: samap
channels:
  - conda-forge
  - anaconda
  - bioconda
dependencies:
  - python=3.7 
  - pip 
  - pybind11 
  - h5py=2.10.0 
  - leidenalg 
  - python-igraph 
  - texttable
  - ipykernel
silastittes commented 2 years ago

Just wanted to follow up on this. Is there an alternative way to get Chromium 10X files to h5ad files that SAMAP can read? Perhaps there are intermediate steps I've overlooked? Version issues? Very interested in applying the SAMap methods. Any help would be greatly appreciated!

atarashansky commented 2 years ago

Hi there - I had stopped getting notifications from this repo for some reason. So sorry about the delay.

Could you try to generate the h5ad file in the same environment as you installed SAMap? SAMap should have all the required dependencies needed to use scanpy to read in the 10x output.

Meanwhile, I will investigate to see what needs to be updated in SAMap to be compatible with the latest version of scanpy.

silastittes commented 2 years ago

Thanks for the suggestion! I didn't but should've realized scanpy was a dependency. Using the same conda env. to generate the h5ad and run SAMap works with the example data. I'm running into a different issue with my data of interest, but I will close this issue as I think the problems aren't related. Will dig around previous issues before posting a new one to see if there's already a solution. Thanks!

avianalter commented 1 year ago

If you are using SAMap in Docker (https://hub.docker.com/r/tarashan/samap), but need to reprocess your data in the same environment as the Docker image, I've pushed a Docker image that should have the software to do so (#115)