Error when loading 10X Cellranger output with read_ATAC_10x()

pabloswfly commented 3 years ago

Hi Anna! I've been trying to load the output from 10X's CellRanger scATACseq aggregated pipeline into EpiScanpy:

mtx_file = "/home/pab/ESPACE_08/outs_aggregate/filtered_peak_bc_matrix/matrix.mtx"
tsv_file = "/home/pab/ESPACE_08/outs_aggregate/filtered_peak_bc_matrix/barcodes.tsv"
bed_file = "/home/pab/ESPACE_08/outs_aggregate/filtered_peak_bc_matrix/peaks.bed"
adata2 = epi.pp.read_ATAC_10x(mtx_file, cell_names=tsv_file, var_names=bed_file)

However, it seems that I'm encountering an error with the read_ATAC_10x() function. The error log:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-028937a6930d> in <module>
      2 tsv_file = "/home/pab/ESPACE_08/outs_aggregate/filtered_peak_bc_matrix/barcodes.tsv"
      3 bed_file = "/home/pab/ESPACE_08/outs_aggregate/filtered_peak_bc_matrix/peaks.bed"
----> 4 adata2 = epi.pp.read_ATAC_10x(mtx_file, cell_names=tsv_file, var_names=bed_file)

~/miniconda3/envs/csg.p/lib/python3.9/site-packages/episcanpy/preprocessing/_load_matrix.py in read_ATAC_10x(matrix, cell_names, var_names, path_file)
     36         var_names = ["_".join(x[:-1].split('\t')) for x in var_names]
     37 
---> 38     adata = ad.AnnData(mat, obs=pd.DataFrame(index=barcodes), var=pd.DataFrame(index=var_names))
     39     adata.uns['omic'] = 'ATAC'
     40 

~/miniconda3/envs/csg.p/lib/python3.9/site-packages/anndata/_core/anndata.py in __init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, obsp, varp, oidx, vidx)
    305                 raise ValueError("`X` has to be an AnnData object.")
    306             self._init_as_view(X, oidx, vidx)
--> 307         else:
    308             self._init_as_actual(
    309                 X=X,

~/miniconda3/envs/csg.p/lib/python3.9/site-packages/anndata/_core/anndata.py in _init_as_actual(self, X, obs, var, uns, obsm, varm, varp, obsp, raw, layers, dtype, shape, filename, filemode)
    470             elif isinstance(X, ZarrArray):
    471                 X = X.astype(dtype)
--> 472             else:  # is np.ndarray or a subclass, convert to true np.ndarray
    473                 X = np.array(X, dtype, copy=False)
    474             # data matrix and shape

TypeError: float() argument must be a string or a number, not 'coo_matrix'

I believe I'm doing the right procedure, as stated in your beta_tutorial_10x_pbmc.html tutorial. Any hint on what might be causing this? I'm using episcanpy==0.3.1 and anndata==0.7.5.

PD: Is there a way to directly load my filtered_peak_bc_matrix.h5 into EpiScanpy, in a similar manner to some R packages like Seurat?

Thanks!

DaneseAnna commented 3 years ago

Hi, thank you for reporting the issue !

Apparently we have some back compatibility with cellranger and anndata. so I just fixed the function and added a new function epi.pp.read_h5 so you can directly read the h5df format from cellranger. Everything should be working if you install the master branch using pip install git+https://github.com/colomemaria/epiScanpy

Best, Anna

pabloswfly commented 3 years ago

Super, the function is fast and works like a charm. Thanks for the quick response and fix!

colomemaria / epiScanpy

Error when loading 10X Cellranger output with read_ATAC_10x() #96