Closed nickforino closed 2 years ago
This info might be one of the things output if you ask umi_tools to
--output-stats=FILE_NAME_PREFIX
. The information you are after will be in
the file names FILE_NAME_PREFIX_per_umi.tsv
.
Couple of things to note, however -
On Thu, 9 Sept 2021 at 22:53, nickforino @.***> wrote:
Hi there,
I've performed an amplicon sequencing experiment and simply want a diagnostic look into PCR duplication events in the library. Looking through the flatfile output of the group tool, I've noticed that some UMIs that are identical in sequence are assigned non-identical unique_ids (presumably because of differences in mapping?) which makes it difficult for me to get the true number of how many reads carry the same UMI. This is the important question I'm after; how many reads have the same UMI?
For example, here are some relevant lines from the flatfile:
read_id contig position gene umi umi_count final_umi final_umi_count unique_id NB551675:127:H5CLFAFX3:4:21612:24023:15475_GGGA XXX 0 NA GGGA 2935 GGGA 2935 1126 NB551675:127:H5CLFAFX3:2:11207:16465:18031_GGGA XXX 210 NA GGGA 1 GGGA 1 1737
The 'GGGA' UMI with ID 1126 was seen 2935 times, while the 'GGGA' with ID 1737 was seen once.
Should I be using a different tool to see the how many reads have identical reads? Count perhaps?
Thanks in advance for your help!
-Nick
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CGATOxford/UMI-tools/issues/487, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJELDXYBB2ADBFGHNESBTLUBEUD7ANCNFSM5DYBXKKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Did this work for you?
Hi Ian, apologies for my late reply,
Yes, the solution you outlined worked perfectly, thank you!
Your warning is well received, I'll outline my thinking and hopefully it is sound: We are performing amplicon sequencing of just one target (hence why we aren't interested in position), and thus we expect that PCR bias will manifest as one (or more) of the 256 UMIs taking the lion's share of the reads. Once I looked at the UMI counts, I did not see any UMI represented more than one standard deviation from the mean, so my takeaway is that there does not exist any PCR product that dominates the reaction when generating the library.
Again, thank you for your help, Ian! Best, Nick
Hi there,
I've performed an amplicon sequencing experiment and simply want a diagnostic look into PCR duplication events in the library. Looking through the flatfile output of the
group
tool, I've noticed that some UMIs that are identical in sequence are assigned non-identical unique_ids (presumably because of differences in mapping?) which makes it difficult for me to get the true number of how many reads carry the same UMI. This is the important question I'm after; how many reads have the same UMI?For example, here are some relevant lines from the flatfile:
read_id contig position gene umi umi_count final_umi final_umi_count unique_id NB551675:127:H5CLFAFX3:4:21612:24023:15475_GGGA XXX 0 NA GGGA 2935 GGGA 2935 1126 NB551675:127:H5CLFAFX3:2:11207:16465:18031_GGGA XXX 210 NA GGGA 1 GGGA 1 1737
The 'GGGA' UMI with ID 1126 was seen 2935 times, while the 'GGGA' with ID 1737 was seen once.
Should I be using a different tool to see the how many reads have identical reads?
Count
perhaps?Thanks in advance for your help!
-Nick