bunnech / cellot

Learning Single-Cell Perturbation Responses using Neural Optimal Transport
BSD 3-Clause "New" or "Revised" License
109 stars 9 forks source link

Glioblastoma dataset: failing to read the anndata #30

Closed JeanRadig closed 2 months ago

JeanRadig commented 2 months ago

The glioblastoma data is corrupted.

The data as provided in https://polybox.ethz.ch/index.php/s/RAykIMfDl0qCJaM cannot be opened easily.

The old data set did not contain the glioblastoma and was found here: https://www.research-collection.ethz.ch/handle/20.500.11850/609681.

We get following error when trying to load both:

hvg-top1k-train-only.h5ad
hvg-train.only.h5ad
KeyError                                  Traceback (most recent call last)
/opt/conda/envs/cellot/lib/python3.9/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
    176         try:
--> 177             return func(elem, *args, **kwargs)
    178         except Exception as e:

/opt/conda/envs/cellot/lib/python3.9/site-packages/anndata/_io/h5ad.py in read_group(group)
    526     if encoding_type:
--> 527         EncodingVersions[encoding_type].check(
    528             group.name, group.attrs["encoding-version"]

/opt/conda/envs/cellot/lib/python3.9/enum.py in __getitem__(cls, name)
    407     def __getitem__(cls, name):
--> 408         return cls._member_map_[name]
    409 

KeyError: 'dict'

During handling of the above exception, another exception occurred:

AnnDataReadError                          Traceback (most recent call last)
<ipython-input-4-c10317dd2048> in <module>
----> 1 gliob = sc.read_h5ad('/workspace/projects/cellot/newest_datasets/datasets_revision/scrna-gbm/hvg-train-only.h5ad')

/opt/conda/envs/cellot/lib/python3.9/site-packages/anndata/_io/h5ad.py in read_h5ad(filename, backed, as_sparse, as_sparse_fmt, chunk_size)
    419                 d[k] = read_dataframe(f[k])
    420             else:  # Base case
--> 421                 d[k] = read_attribute(f[k])
    422 
    423         d["raw"] = _read_raw(f, as_sparse, rdasp)

/opt/conda/envs/cellot/lib/python3.9/functools.py in wrapper(*args, **kw)
    875                             '1 positional argument')
    876 
--> 877         return dispatch(args[0].__class__)(*args, **kw)
    878 
    879     funcname = getattr(func, '__name__', 'singledispatch function')

/opt/conda/envs/cellot/lib/python3.9/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
    181             else:
    182                 parent = _get_parent(elem)
--> 183                 raise AnnDataReadError(
    184                     f"Above error raised while reading key {elem.name!r} of "
    185                     f"type {type(elem)} from {parent}."

AnnDataReadError: Above error raised while reading key '/layers' of type <class 'h5py._hl.group.Group'> from /.

I have tried to go around by looping through the data and trying to clean the data before recreating an anndata but it failed.

bunnech commented 2 months ago

That might be due to different versions of scanpy and anndata. Please check also previous issues of this repository.