epi2me-labs / wf-single-cell

Other
69 stars 35 forks source link

Reads without adaptor/primer won't be used, so duplex reads will not be used in wf-single-cell? #127

Open ErminZ opened 1 month ago

ErminZ commented 1 month ago

Is your feature related to a problem?

The duplex reads don't contain primers nor adaptors due to how duplex works, the duplex reads themselves will have the adapters and primers trimmed off. In the wf-single-cell, adapter configuration section, reads without adaptors/primers will be categorized into Others: No valid adapters found; not used in further analysis. So the hight quality duplex reads are useless in the pipeline.

Describe the solution you'd like

Add a wf-single-cell parameter that will use duplex reads that are categorized in Others in the adapter configuration section. The duplex reads have a tag dx:i:1 in the bam file, or read id contains ";". Is there a way to keep these reads?

Describe alternatives you've considered

Or ask Dorado duplex to add a function not trim primers/adapters.

Or add primer sequences manually to the duplex reads before running wf-single-cell.

Additional context

Thank you for developing such a useful pipeline that works for long-reads.

nrhorner commented 1 month ago

Hi @ErminZ

Thanks for you question. I would think the 10x adapters/primers would be kept in the case of duplex reads. But I haven't tested this out yet. I will try it out and get back to you.

ErminZ commented 1 month ago

Thank you for your reply! I also just tested a 1 million duplex reads using the single-cell pipeline, most of the reads have 10x primers. Please let me know if you would explain more about the trimming mechanism by Dorado or wf-single-cell.

"BIOLOGICAL_duplex": {
        "general": {
            "n_reads": 1065534,
            "rl_mean": 811.6867167073036,
            "rl_std_dev": 446.9855681878009,
            "n_fl": 483380,
            "n_stranded": 989717
        },
        "strand_counts": {
            "n_plus": 565033,
            "n_minus": 424684
        },
        "detailed_config": {
            "adapter1_f-adapter2_f": 278771,
            "adapter1_f": 246063,
            "adapter2_r": 228474,
            "adapter2_r-adapter1_r": 167066,
            "*": 39447,
            "adapter2_f": 18974,
            "adapter1_r": 12826,
            "adapter2_f-adapter1_f": 12190,
            "adapter1_f-adapter2_f-adapter2_r-adapter1_r": 10708,
            "adapter1_f-adapter2_f-adapter2_r": 9347,
            "adapter2_r-adapter1_r-adapter1_f-adapter2_f": 9240,
            "adapter1_r-adapter2_r": 5760,
            "adapter2_r-adapter1_r-adapter1_f": 4170,
            "adapter1_f-adapter2_r-adapter2_f": 2135,
            "adapter2_f-adapter2_r": 2554,
            "adapter1_f-adapter2_r": 2486,
            "adapter2_r-adapter2_f": 2294,
            "adapter1_f-adapter1_r": 1824,
            "adapter1_f-adapter1_r-adapter2_f": 1401,
            "adapter2_r-adapter1_f": 1274,
            "adapter2_r-adapter2_f-adapter1_r": 1158,
            "adapter1_r-adapter1_f": 1237,
            "adapter1_r-adapter1_f-adapter2_f-adapter2_r": 545,
            "adapter2_r-adapter1_f-adapter1_r": 768,
            "adapter2_f-adapter2_r-adapter1_r-adapter1_f": 740,
            "adapter1_r-adapter1_f-adapter2_f": 615,
            "adapter1_f-adapter2_f-adapter1_r-adapter2_r": 480,
            "adapter2_r-adapter1_r-adapter2_f-adapter1_f": 651,
            "adapter2_f-adapter2_r-adapter1_r": 592,
            "adapter2_f-adapter1_f-adapter2_r": 173,
            "adapter1_r-adapter2_f-adapter1_f": 161,
            "adapter1_f-adapter2_f-adapter1_r": 131,
            "adapter2_f-adapter1_r-adapter2_r": 158,
            "adapter2_f-adapter1_r": 110,
            "adapter2_f-adapter1_f-adapter1_r": 93,
            "adapter1_r-adapter2_f": 109,
            "adapter1_f-adapter1_r-adapter2_r": 45,
            "adapter1_f-adapter2_r-adapter1_r": 87,
            "adapter2_r-adapter2_f-adapter1_f": 120,