BioJulia / SingleCellProjections.jl

Analysis of Single Cell Expression data in Julia
Other
17 stars 3 forks source link

Read error on TabulaSapients 10X data #22

Open rasmushenningsson opened 3 weeks ago

rasmushenningsson commented 3 weeks ago

Transferred issue from here: https://github.com/rasmushenningsson/SingleCell10x.jl/issues/5

Hi Rasmus,

Just for fun, I thought I'd try applying SingleCellProjections to a dataset I had on hand, the TabulaSapiens eye .h5ad file. If you want to grab this file yourself, you can find it here: https://figshare.com/articles/dataset/Tabula_Sapiens_release_1_0/14267219?file=34701970.

However, it seems that load_h5ad isn't particularly happy with this file:

(jl_NPMjKi) pkg> st # Julia 1.11.0-rc2
Status `/tmp/jl_NPMjKi/Project.toml`
  [03d38035] SingleCellProjections v0.4.0

julia> counts = load_counts("/data/gene_variants_data/extracted/tabula_sapiens/TS_Eye.h5ad/TS_Eye.h5ad")
ERROR: unexpected character '\x04' after quoted field at row 250 column 1
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] 
    @ DelimitedFiles ~/.julia/packages/DelimitedFiles/aGcsu/src/DelimitedFiles.jl:727
  [3] readdlm_string(sbuff::String, dlm::Char, T::Type, eol::Char, auto::Bool, optsd::Dict{Symbol, Union{…}})
    @ DelimitedFiles ~/.julia/packages/DelimitedFiles/aGcsu/src/DelimitedFiles.jl:461
  [4] readdlm_auto(input::IOStream, dlm::Char, T::Type, eol::Char, auto::Bool; opts::@Kwargs{})
    @ DelimitedFiles ~/.julia/packages/DelimitedFiles/aGcsu/src/DelimitedFiles.jl:231
  [5] readdlm_auto
    @ ~/.julia/packages/DelimitedFiles/aGcsu/src/DelimitedFiles.jl:231 [inlined]
  [6] readdlm
    @ ~/.julia/packages/DelimitedFiles/aGcsu/src/DelimitedFiles.jl:226 [inlined]
  [7] readdlm
    @ ~/.julia/packages/DelimitedFiles/aGcsu/src/DelimitedFiles.jl:86 [inlined]
  [8] _read10x_features(io::IOStream; delim::Char)
    @ SingleCell10x ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:306
  [9] _read10x_features
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:305 [inlined]
 [10] #19
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:315 [inlined]
 [11] #1
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:57 [inlined]
 [12] open(f::SingleCell10x.var"#1#2"{SingleCell10x.var"#19#20"{@Kwargs{delim::Char}}, Bool}, args::String; kwargs::@Kwargs{})
    @ Base ./io.jl:410
 [13] open
    @ ./io.jl:407 [inlined]
 [14] _open(f::SingleCell10x.var"#19#20"{@Kwargs{delim::Char}}, filename::String)
    @ SingleCell10x ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:55
 [15] #_read10x_features#18
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:314 [inlined]
 [16] _read10x_features_triplet(filename::String; guess::Function, kwargs::@Kwargs{})
    @ SingleCell10x ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:329
 [17] _read10x_features_triplet
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:327 [inlined]
 [18] #_read10x_features_autodetect#22
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:324 [inlined]
 [19] _read10x_features_autodetect
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:320 [inlined]
 [20] read10x_features(io::String, featuretype::Type; kwargs::@Kwargs{})
    @ SingleCell10x ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:357
 [21] read10x_features
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:356 [inlined]
 [22] _load10x_metadata
    @ ~/.julia/packages/SingleCellProjections/0yZXZ/src/load.jl:96 [inlined]
 [23] load10x(filename::String; lazy::Bool, var_id::Nothing, var_id_delim::Char, kwargs::@Kwargs{})
    @ SingleCellProjections ~/.julia/packages/SingleCellProjections/0yZXZ/src/load.jl:135
 [24] load10x
    @ ~/.julia/packages/SingleCellProjections/0yZXZ/src/load.jl:130 [inlined]
 [25] #29
    @ ./broadcast.jl:1306 [inlined]
 [26] _broadcast_getindex_evalf
    @ ./broadcast.jl:673 [inlined]
 [27] _broadcast_getindex
    @ ./broadcast.jl:646 [inlined]
 [28] getindex
    @ ./broadcast.jl:605 [inlined]
 [29] copy
    @ ./broadcast.jl:906 [inlined]
 [30] materialize
    @ ./broadcast.jl:867 [inlined]
 [31] load_counts(loadfun::typeof(load10x), filenames::String; sample_names::Nothing, sample_name_col::Nothing, lazy::Bool, lazy_merge::Bool, obs_id_col::String, obs_id_delim::Char, obs_id_prefixes::Nothing, extra_var_id_cols::Symbol, duplicate_var::Nothing, duplicate_obs::Nothing, callback::Nothing, kwargs::@Kwargs{})
    @ SingleCellProjections ~/.julia/packages/SingleCellProjections/0yZXZ/src/load.jl:221
 [32] load_counts
    @ ~/.julia/packages/SingleCellProjections/0yZXZ/src/load.jl:199 [inlined]
 [33] load_counts(filenames::String)
    @ SingleCellProjections ~/.julia/packages/SingleCellProjections/0yZXZ/src/load.jl:227
 [34] top-level scope
    @ REPL[8]:1
Some type information was truncated. Use `show(err)` to see complete types.

Locally modifying _read10x_features to get the first 200 bytes of the IO that readdlm was called on, this is what I find:

\x89HDF\r\n\x1a\n\0\0\0\0\0\b\b\0\x04\0\x10\0\0\0\0\0\0\0\0\0\0\0\0\0\xff\xff\xff\xff\xff\xff\xff\xff\$\x84\xd5F\0\0\0\0\xff\xff\xff\xff\xff\xff\xff\xff\0\0\0\0\0\0\0\0`\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\x88\0\0\0\0\0\0\0\xa8\x02\0\0\0\0\0\0\x01\0\x01\0\x01\0\0\0\x18\0\0\0\0\0\0\0\x11\0\x10\0\0\0\0\0\x88\0\0\0\0\0\0\0\xa8\x02\0\0\0\0\0\0TREE\0\0\x01\0\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\0\0\0\0\0\0\0\0\xe0\x05\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0...
rasmushenningsson commented 3 weeks ago

Hi @tecosaur,

Unfortunately the h5ad support is experimental and not really tested. (There are different versions of the .h5ad format and I've only tried with one of them, quite some time ago.)

At the moment, you need to use loadh5ad, not load_counts to load .h5ad files. But I tried with your file and that doesn't work either. 🤦 (I should really try to improve the interface here.)

The good news is that it's probably an easy fix. I'll try to take look soon!