dcjones / proseg

Probabilistic cell segmentation for in situ spatial transcriptomics
Other
45 stars 3 forks source link

Different number of transcripts read #15

Closed roanvanscheppingen closed 3 months ago

roanvanscheppingen commented 5 months ago

Using (Roan) [rvanscheppingen@n0118 All_transcripts]$ proseg full_transcripts.csv.gz --cosmx --coordinate-scale 1 --output-maxpost-counts counts.csv.gz --nthreads 14 --voxel-layers 15

Using 14 threads Read 11075450 transcripts

The same holds true for not using the voxel-layers argument. However, the csv file is bigger (Roan) [rvanscheppingen@n0118 All_transcripts]$ zcat full_transcripts.csv.gz | wc -l

22073511

Is there a subsampling, or does it take only transcripts on a certain distance from polygons? I've seen similar behaviour before, but that differed 20K transcripts on a total of 1 million (same dataset). Currently I assume that there are transcripts very far away from cells and therefore not taken into account. I'll let it run and inspect later.

Edit; upon further inspection I see that --coordinate-scale is detrimental. Reducing this to the recommended 0.12 for cosmx data returns close to 22 million transcripts. Not sure if this is how it's intended to influence the max distance to a cell.