Kennedy-Lab-UW / Duplex-Seq-Pipeline

A standalone end-to-end data analysis pipeline for Duplex Sequencing
Other
21 stars 9 forks source link

patch v2.1.2 #93

Closed bkohrn closed 3 years ago

bkohrn commented 3 years ago

v2.1.2: Bugfix:

Internal Changes:

bkohrn commented 3 years ago

As far as I can tell, this seems to be working on WSL1. Before merging, it should be tested on WSL2, Linux, and Mac OS X (at a minimum).

bkohrn commented 3 years ago

I can probably manage a Mac OS X test.

bkohrn commented 3 years ago

Just realized I still need to update the README due to a slight change in the produced outputs.

bkohrn commented 3 years ago

Just realized: this UCM produces different output (something in the mono-nucleotide repeat detection isn't working properly). As stands (in v2.1.1), the cm_stats file looks like this:

Consensus Making Statistics:
Command: /home/kohrnb/bioinformatics/Duplex-Seq-Pipeline-v2.0.0/scripts/UnifiedConsensusMaker.py --input /dev/stdin --taglen 8 --spacerlen 1 --loclen 8 --write-sscs --prefix noProf.test_main.1868MCL --tagstats --cutoff 0.7 --Ncutoff 0.0142 --numCores 8 --minmem 3 --maxmem 200
Started at Friday, 25. June 2021 10:28AM
Finished at Friday, 25. June 2021 10:39AM
6055981 reads UMI processed
6055980 reads processed
1229764 families processed
    622052 unrepresented families
    566372 families with family size < 3
    1262 families (DCS pairs) filtered for UMIs with mononucleotide repeats
663392 SSCS made
    0 SSCS filtered for excessive Ns
216912 DCS made
    708996 DCS failed due to missing SSCS
    27228 DCS filtered for excessive Ns

The current dev version looks like this:

Consensus Making Statistics:
Command: /home/kohrnb/bioinformatics/dev/DS_main/Duplex-Seq-Pipeline/scripts/UnifiedConsensusMaker.py --input /dev/stdin --taglen 8 --spacerlen 1 --loclen 8 --write-sscs --prefix noProf.devTest_main.1868MCL --tagstats --cutoff 0.7 --Ncutoff 0.0142 --numCores 8 --minmem 3 --maxmem 200
Started at Friday, 25. June 2021 10:20AM
Finished at Friday, 25. June 2021 10:24AM
6055981 reads UMI processed
514 reads processed
158 families processed
    102 unrepresented families
    94 families with family size < 3
    6055467 families (DCS pairs) filtered for UMIs with mononucleotide repeats
    0 families filtered for Ns in the UMI
64 SSCS made
    0 SSCS filtered for excessive Ns
18 DCS made
    112 DCS failed due to missing SSCS
    2 DCS filtered for excessive Ns

Some of this is on me for not making these changes in a separate branch from the dev branch.

bkohrn commented 3 years ago

OK; I now believe this is giving the same output as v2.1.1.