Kennedy-Lab-UW / Duplex-Seq-Pipeline

A standalone end-to-end data analysis pipeline for Duplex Sequencing
Other
21 stars 9 forks source link

V2.0.0 prerelease #75

Closed bkohrn closed 3 years ago

bkohrn commented 3 years ago

Release merge for v2.0.0. Changes:
Bugfixes: Add definition of negative taxIDs to report. Fix bed blocks issue where terminal comma would cause crashes Fix table formatting for countmuts and depth tables in report Change clustering to avoid using SNPs for clustering analysis. explicitly convert report to HTML Fix fastq output third line from consensusMaker Set quality of N bases from consensusMaker to 0 Fix crash on non-CATG bases in the reference genome

New Features: Change recovery script format Create new test data set that can better demonstrate the BLAST filter Create new test reference / blastDB to match new test dataset Add extra tests to the test config file Make retrieveSummary.py work from the whole-pipeline config file Reorganize BLAST control to make running without BLAST more explicit. Add unlock script to setup script Implement new depth script Implement VarDict Move the PostBlastRecovery to its own environment, allowing custom user programs without affecting the base environment. Create script to summarize depth based on a provided bed file Add the ability to select which filters the mutation frequency program applies. Add % mapped raw read to the summary CSV Add RawOnTarget to summaryCSV Add masking functionality Change countMutsPerCycle to allow filtering out mutations based on VCF filters, and to allow for filtering of near indel variants. Also allows for an "include" mode that includes only variants in the VCF file. Modify Snakefile to allow this method to draw from the countmuts filtering parameters. Add adapter clipping Add Mamba frontend to setup script. Add readout for % on target SSCS and DCS to summaryCSV, report.

Internal Changes: Add gitignore rule to ignore user-generated recovery scripts Remove chrM_recovery, which is a custom recovery script from our lab Remove testConfig.csv, since it is created by the setup script Remove GATK3 from setup and Snakefile Add vardict-java to environment Modify MakeDepthPlot.R and retrieveSummary.py to point to new files. Add VarDict-based Muts by Read Position program (not used) Give final read length to MutsPerCycle, instead of initial read length. Remove extra environment setup rules Move BedParser to seperate file Added pre-variant calling BAM filter to BED coordinates Make prevar file temporary. Add error checking to enforce number of blockStarts and blockSizes add str and repr methods for Bed_Line Add Bed_Writer functionality Add DepthSummaryCsv to Snakefile Add filter definitions to mutation frequency output and report Verify that a variant is consists entirely of ACGTN bases Change r versioning in DS_env_full Add a bed buffering step pre-vardict Add bedtools to run environment Change BLAST database setup and application