dcjones / proseg

Probabilistic cell segmentation for in situ spatial transcriptomics
Other
45 stars 3 forks source link

Proseg fails to read any transcripts from some .csv.gz files #29

Open GBeattie opened 4 months ago

GBeattie commented 4 months ago

Hi,

I was coming across a lot of missing FOVs and saw you recently updated proseg to 1.0.6 which has some more operability with CosMx data and may help with the issue (#26 ), however I run into an error. I'll put the error and a sample of the transcript file below. Note: I managed to run this on 1.0.5 by editing the column names, so the data should be OK, also removing the --use-cell-initialization flag results in the same error.

(base) gordonbeattie@192 L1_SU500 % proseg -V
proseg 1.0.6
(base) gordonbeattie@192 L1_SU500 % proseg --cosmx L1_SU500_tx_file.csv.gz --use-cell-initialization
Using 8 threads
thread 'main' panicked at /Users/gordonbeattie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/proseg-1.0.6/src/main.rs:511:18:
index out of bounds: the len is 0 but the index is 0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
(base) gordonbeattie@192 L1_SU500 % gzip -cd L1_SU500_tx_file.csv.gz| head
fov,cell_ID,cell,x_local_px,y_local_px,x_global_px,y_global_px,z,target,CellComp
1,0,c_1_1_0,4243,67,66309.5474243164,55391.5182749431,7,Scd2,None
1,0,c_1_1_0,4243,1035,66309.4480832418,54423.7534205119,1,Tmsb4x,None
1,0,c_1_1_0,4242,1627,66308.8361422221,53831.2435150147,3,Atp1a2,None
1,0,c_1_1_0,4242,1898,66308.2679112752,53560.6026649475,1,Pcp4,None
1,0,c_1_1_0,4243,2704,66309.7858428955,52753.8736661275,1,Pfkm,None
1,0,c_1_1_0,4243,3836,66309.1977437337,51622.1841176351,1,Cst7,None
1,0,c_1_1_0,4243,3880,66309.7858428955,51577.8342882792,5,Mdh1,None
1,0,c_1_1_0,4242,3863,66308.2281748454,51595.2348709107,3,Camk2a,None
1,0,c_1_1_0,4242,3846,66308.2679112752,51611.9321187337,5,Rps9,None

Thanks in advance for any assistance!

All the best, Gordon

dcjones commented 4 months ago

Hi Gordon,

This seems that no transcripts were read for some reason. I'm not sure what's going on here, but can you confirm that the cell_ID column has some non-zero values in this data?

GBeattie commented 4 months ago

Thanks for the response, I can confirm the cell_ID has some non-zero values, although most of them are 0. I'll put a few metrics below to give a little more insight.

> head(table(tx.list$Nanostring$cell_ID))
       0        1        2        3        4        5 
18825028    60665    53141    58098    63807    59214 

> length(unique(tx.list$Nanostring$cell_ID))
[1] 1484

> length(unique(tx.list$Nanostring$fov))
[1] 169

> length(unique(tx.list$Nanostring$cell))
[1] 160243
ximbao commented 3 months ago

Having the same issue trying to run on CosMX

proseg --cosmx Diana_HEM_CR_FF_EM_NR4A145koko315pA7_STJ_N_R1_tx_file.csv.gz Using 192 threads thread 'main' panicked at /home/fsegato/.cargo/registry/src/index.crates.io-6f17d22bba15001f/proseg-1.1.0/src/main.rs:521:18: index out of bounds: the len is 0 but the index is 0 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

asmilagswash commented 2 months ago

Having the same issue with CosMx Data

proseg --cosmx S22113961S22113960_tx_file.csv.gz --use-cell-initialization Using 24 threads thread 'main' panicked at /home/asmilags/.cargo/registry/src/index.crates.io-6f17d22bba15001f/proseg-1.1.3/src/main.rs:522:18: index out of bounds: the len is 0 but the index is 0 stack backtrace: 0: rust_begin_unwind 1: core::panicking::panic_fmt 2: core::panicking::panic_bounds_check 3: proseg::main note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.

dbuszta commented 2 months ago

And the same with Xenium data processed with xenium_ranger version 3: proseg --xenium transcripts.csv.gz

Using 2 threads thread 'main' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/proseg-1.0.0/src/main.rs:474:18: index out of bounds: the len is 0 but the index is 0 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

dcjones commented 2 months ago

Each of these reported errors is due to no transcripts being read by proseg, but I've not been able suss out where the loss might be occurring, and haven't been able to reproduce it.

If someone would be so kind as to email (or otherwise send) me a data to reproduce it, I'll fix this right away. I suspect that if this error happens with the full transcripts file, the first 10k lines or so should generate the same error and be small enough to email.

dcjones commented 1 month ago

I just release version 1.1.5 which I believe fixes this issue. Thanks for your patience, and thanks to @dbuszta for sharing a dataset to reproduce the issue.