Closed ErminZ closed 1 month ago
Hi @ErminZ
Thanks for you question. I would think the 10x adapters/primers would be kept in the case of duplex reads. But I haven't tested this out yet. I will try it out and get back to you.
Thank you for your reply! I also just tested a 1 million duplex reads using the single-cell pipeline, most of the reads have 10x primers. Please let me know if you would explain more about the trimming mechanism by Dorado or wf-single-cell.
"BIOLOGICAL_duplex": {
"general": {
"n_reads": 1065534,
"rl_mean": 811.6867167073036,
"rl_std_dev": 446.9855681878009,
"n_fl": 483380,
"n_stranded": 989717
},
"strand_counts": {
"n_plus": 565033,
"n_minus": 424684
},
"detailed_config": {
"adapter1_f-adapter2_f": 278771,
"adapter1_f": 246063,
"adapter2_r": 228474,
"adapter2_r-adapter1_r": 167066,
"*": 39447,
"adapter2_f": 18974,
"adapter1_r": 12826,
"adapter2_f-adapter1_f": 12190,
"adapter1_f-adapter2_f-adapter2_r-adapter1_r": 10708,
"adapter1_f-adapter2_f-adapter2_r": 9347,
"adapter2_r-adapter1_r-adapter1_f-adapter2_f": 9240,
"adapter1_r-adapter2_r": 5760,
"adapter2_r-adapter1_r-adapter1_f": 4170,
"adapter1_f-adapter2_r-adapter2_f": 2135,
"adapter2_f-adapter2_r": 2554,
"adapter1_f-adapter2_r": 2486,
"adapter2_r-adapter2_f": 2294,
"adapter1_f-adapter1_r": 1824,
"adapter1_f-adapter1_r-adapter2_f": 1401,
"adapter2_r-adapter1_f": 1274,
"adapter2_r-adapter2_f-adapter1_r": 1158,
"adapter1_r-adapter1_f": 1237,
"adapter1_r-adapter1_f-adapter2_f-adapter2_r": 545,
"adapter2_r-adapter1_f-adapter1_r": 768,
"adapter2_f-adapter2_r-adapter1_r-adapter1_f": 740,
"adapter1_r-adapter1_f-adapter2_f": 615,
"adapter1_f-adapter2_f-adapter1_r-adapter2_r": 480,
"adapter2_r-adapter1_r-adapter2_f-adapter1_f": 651,
"adapter2_f-adapter2_r-adapter1_r": 592,
"adapter2_f-adapter1_f-adapter2_r": 173,
"adapter1_r-adapter2_f-adapter1_f": 161,
"adapter1_f-adapter2_f-adapter1_r": 131,
"adapter2_f-adapter1_r-adapter2_r": 158,
"adapter2_f-adapter1_r": 110,
"adapter2_f-adapter1_f-adapter1_r": 93,
"adapter1_r-adapter2_f": 109,
"adapter1_f-adapter1_r-adapter2_r": 45,
"adapter1_f-adapter2_r-adapter1_r": 87,
"adapter2_r-adapter2_f-adapter1_f": 120,
Hi @ErminZ
I believe that duplex reads will have the sequencing adapters trimmed off, but the parent read will remain untrimmed https://github.com/nanoporetech/dorado/issues/679 , but I assume the 10x-specific sequences needed for this workflow should still be present, although I have yet to look at any duplex data. So these reads should be processed by the workflow and as they should have the same UMI, will not be counted twice.
Closing due to lack of response
Is your feature related to a problem?
The duplex reads don't contain primers nor adaptors due to how duplex works, the duplex reads themselves will have the adapters and primers trimmed off. In the wf-single-cell, adapter configuration section, reads without adaptors/primers will be categorized into Others: No valid adapters found; not used in further analysis. So the hight quality duplex reads are useless in the pipeline.
Describe the solution you'd like
Add a wf-single-cell parameter that will use duplex reads that are categorized in Others in the adapter configuration section. The duplex reads have a tag
dx:i:1
in the bam file, or read id contains ";". Is there a way to keep these reads?Describe alternatives you've considered
Or ask Dorado duplex to add a function not trim primers/adapters.
Or add primer sequences manually to the duplex reads before running wf-single-cell.
Additional context
Thank you for developing such a useful pipeline that works for long-reads.