aristoteleo / dynast-release

Inclusive and efficient quantification of labeling and splicing RNAs for time-resolved metabolic labeling based scRNA-seq experiments
https://dynast-release.readthedocs.io/en/latest/
MIT License
15 stars 4 forks source link

Dynast count error: #6

Closed mav-mit closed 2 years ago

mav-mit commented 2 years ago

Despite using Dynast Count on the same data there seems to be a difference when running Dynast with "TC, GA ", "GA", and "TC". It seems that there's in increase in the conversion you call for. (ie. Higher TC when you look for TC)

210809_dual_labeling.pdf

Lioscro commented 2 years ago

Hi, @martinavillanueva, What exactly are you plotting here? Are these the mutation rates?

Assuming those are what you are plotting here, it is likely due to how UMI deduplication works. When reads with the same cell BC and UMI that maps to the same gene is observed, the read with the most conversions of interest is selected.

Xiaojieqiu commented 2 years ago

Thanks! My understanding is that when Martin calls for TC,GA (with --conversion TC,GA argument in dynast count), the TC, GA mutation rates are different from when you call for TC or GA separately (with --conversion TC or --conversion GA argument in dynast count). And when calling for TC or GA, the corresponding TC/GA mutation rate is higher than when you don't look for it. Is any special treatment for the mutation you asked for (via ---conversion) comparing the rest mutations?

mav-mit commented 2 years ago

Exactly @Xiaojieqiu! Does that make sense @Lioscro ?

mav-mit commented 2 years ago

Take a look at GA conversion and how it it lower when we don't look for the conversion (last slide) vs when we do look for it (the top 2 slides)

210809_dual_labeling.pdf

Lioscro commented 2 years ago

I see what you mean. This is because in the UMI deduplication step, which read is selected depends on the number of conversions (see my previous comment). When you supply --conversion TC,GA, the read with the most TC+GA conversions is selected; when you supply --conversion GA, the read with the most GA conversions is selected; and vice-versa when you supply --conversion TC. (To be exact, the order of priority is 1) the read that maps to the transcriptome (exon only), 2) the read that has the highest alignment score, 3) read with the highest sum of the provided --conversion.)

Does that make sense? So it seems that you have many reads per UMI that map to the same gene, do not map to exons only, have (equal) maximum alignment score, but have quite different conversion numbers.

mav-mit commented 2 years ago

I see. And so the reason we see changes in other conversions (see blue and yellow circles) is because based on the transcripts that were selected to have the conversion of interest, it changes the background. Is that right?

Would you expect this to affect the accuracy of calling new / old transcripts? 210809_dual_labeling_2.pdf

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days