dcjones / proseg

Probabilistic cell segmentation for in situ spatial transcriptomics
Other
44 stars 1 forks source link

Populuation of cell_metadata is not exactly the same #16

Open roanvanscheppingen opened 3 months ago

roanvanscheppingen commented 3 months ago

The population column of cell_metadata describes the number of transcripts per cell. However, this differs from the transcript_metadata file.

Cell 0, population = 1185 -- transcripts in counts 1218 -- in expec counts = 1145,118 Cell 1, population = 448 -- transcripts in counts 508 -- in expec counts = 440,7789 Cell 2, population = 713 -- transcripts in counts 767 -- in expec counts = 685,7118

The values in counts are equal of those if you would subset the transcript_metadata file on the assignment column.

dcjones commented 3 months ago

These are being computed slightly differently, which I agree that is confusing so I'll reconcile these in a future version.

Expected counts are from transcript assignments averaged over many samples, so this will not exactly agree with the point estimates reported in the metadata tables.

yihming commented 3 weeks ago

Hello. I just wanna follow up on this discussion.

In my case, I have cell 0 with population = 858, while in transcript-metadata.csv.gz, 10,751 transcripts are assigned to cell 0. After filtering with background==0 and confusion==0, I only have 819 transcripts left, still not consistent with population.

Could you guide me on figuring this out? I think I don't fully understand the criteria of deciding a valid transcript. Should I also consider probability for the filtering?

Any help would be appreciated! Thanks!