gao-lab / Cell_BLAST

A BLAST-like toolkit for large-scale scRNA-seq data querying and annotation.
http://cblast.gao-lab.org
MIT License
86 stars 13 forks source link

KeyError: "Unable to open object (object 'obs' doesn't exist)" #16

Closed MartaBenegas closed 3 years ago

MartaBenegas commented 3 years ago

Hi, it's me again! I was able to construct the database, but now I'm having some issues with my input file.

Originally, my input file was in csv format so I converted it to h5 using the h5write() function from rhdf5 R library as follows:

library(rhdf5)
cells <- read.csv(file = "/home/biobam/Downloads/tabula_muris_dataset/Brain_Myeloid-counts.csv", header = TRUE, row.names = 1, quote = "")
cells <- as.matrix(cells)
h5write(cells, "/home/biobam/Downloads/brain_cells.h5", "brain")

But when I tried to read it, it rose the following error:

>>> cells  = cb.data.ExprDataSet.read_dataset("/home/biobam/Downloads/cell_blast_test/brain_cells.h5")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/data.py", line 553, in read_dataset
    dict_from_group(f["obs"]),
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/h5py/_hl/group.py", line 288, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'obs' doesn't exist)"

Is there a problem with my input file? Here you have a link to a Drive folder with the original csv and converted h5 files.

Thanks in advance! Marta.

Jeff1995 commented 3 years ago

Hi Marta,

That's because the ".h5" file Cell BLAST uses has a special format specification, requiring certain hdf5 groups like "exprs" for expression matrix, "obs" for cell-level meta information, "var" for gene-level meta data, etc. (Basically a simplified version of anndata.) It's complaining about missing "obs" group because your hdf5 file does not comply to the required format.

Actually it's unnecessary to write the hdf5 by yourself. We have a dedicated function to read text-based data files like this:

ds = cb.data.ExprDataSet.read_table("Brain_Myeloid-counts.csv", orientation="gc", sparsify=True, index_col=0)

See this documentation for details.