Reads without adaptor/primer won't be used, so duplex reads will not be used in wf-single-cell?

ErminZ commented 4 months ago

Is your feature related to a problem?

The duplex reads don't contain primers nor adaptors due to how duplex works, the duplex reads themselves will have the adapters and primers trimmed off. In the wf-single-cell, adapter configuration section, reads without adaptors/primers will be categorized into Others: No valid adapters found; not used in further analysis. So the hight quality duplex reads are useless in the pipeline.

Describe the solution you'd like

Add a wf-single-cell parameter that will use duplex reads that are categorized in Others in the adapter configuration section. The duplex reads have a tag dx:i:1 in the bam file, or read id contains ";". Is there a way to keep these reads?

Describe alternatives you've considered

Or ask Dorado duplex to add a function not trim primers/adapters.

Or add primer sequences manually to the duplex reads before running wf-single-cell.

Additional context

Thank you for developing such a useful pipeline that works for long-reads.

nrhorner commented 3 months ago

Hi @ErminZ

Thanks for you question. I would think the 10x adapters/primers would be kept in the case of duplex reads. But I haven't tested this out yet. I will try it out and get back to you.

ErminZ commented 3 months ago

Thank you for your reply! I also just tested a 1 million duplex reads using the single-cell pipeline, most of the reads have 10x primers. Please let me know if you would explain more about the trimming mechanism by Dorado or wf-single-cell.

"BIOLOGICAL_duplex": {
        "general": {
            "n_reads": 1065534,
            "rl_mean": 811.6867167073036,
            "rl_std_dev": 446.9855681878009,
            "n_fl": 483380,
            "n_stranded": 989717
        },
        "strand_counts": {
            "n_plus": 565033,
            "n_minus": 424684
        },
        "detailed_config": {
            "adapter1_f-adapter2_f": 278771,
            "adapter1_f": 246063,
            "adapter2_r": 228474,
            "adapter2_r-adapter1_r": 167066,
            "*": 39447,
            "adapter2_f": 18974,
            "adapter1_r": 12826,
            "adapter2_f-adapter1_f": 12190,
            "adapter1_f-adapter2_f-adapter2_r-adapter1_r": 10708,
            "adapter1_f-adapter2_f-adapter2_r": 9347,
            "adapter2_r-adapter1_r-adapter1_f-adapter2_f": 9240,
            "adapter1_r-adapter2_r": 5760,
            "adapter2_r-adapter1_r-adapter1_f": 4170,
            "adapter1_f-adapter2_r-adapter2_f": 2135,
            "adapter2_f-adapter2_r": 2554,
            "adapter1_f-adapter2_r": 2486,
            "adapter2_r-adapter2_f": 2294,
            "adapter1_f-adapter1_r": 1824,
            "adapter1_f-adapter1_r-adapter2_f": 1401,
            "adapter2_r-adapter1_f": 1274,
            "adapter2_r-adapter2_f-adapter1_r": 1158,
            "adapter1_r-adapter1_f": 1237,
            "adapter1_r-adapter1_f-adapter2_f-adapter2_r": 545,
            "adapter2_r-adapter1_f-adapter1_r": 768,
            "adapter2_f-adapter2_r-adapter1_r-adapter1_f": 740,
            "adapter1_r-adapter1_f-adapter2_f": 615,
            "adapter1_f-adapter2_f-adapter1_r-adapter2_r": 480,
            "adapter2_r-adapter1_r-adapter2_f-adapter1_f": 651,
            "adapter2_f-adapter2_r-adapter1_r": 592,
            "adapter2_f-adapter1_f-adapter2_r": 173,
            "adapter1_r-adapter2_f-adapter1_f": 161,
            "adapter1_f-adapter2_f-adapter1_r": 131,
            "adapter2_f-adapter1_r-adapter2_r": 158,
            "adapter2_f-adapter1_r": 110,
            "adapter2_f-adapter1_f-adapter1_r": 93,
            "adapter1_r-adapter2_f": 109,
            "adapter1_f-adapter1_r-adapter2_r": 45,
            "adapter1_f-adapter2_r-adapter1_r": 87,
            "adapter2_r-adapter2_f-adapter1_f": 120,

nrhorner commented 1 month ago

Hi @ErminZ

I believe that duplex reads will have the sequencing adapters trimmed off, but the parent read will remain untrimmed https://github.com/nanoporetech/dorado/issues/679 , but I assume the 10x-specific sequences needed for this workflow should still be present, although I have yet to look at any duplex data. So these reads should be processed by the workflow and as they should have the same UMI, will not be counted twice.

nrhorner commented 1 month ago

Closing due to lack of response

epi2me-labs / wf-single-cell