McMinds-Lab / devil-variant-pipeline

Pipeline to process NGS data and perform joint variant calling
0 stars 0 forks source link

devil-variant-pipeline

Pipeline to process NGS probe-capture data derived from Tasmanian Devils and DFTD samples.
The pipeline begins with raw FastQ files and generates joint variant calls for the Devil
hosts and the tumors.

===Primary Pipeline===


1_setup.sh [Move reads into a single folder, rename read files, and generate batch directory structure]
2_pre-qc.sh [Perform initial QC on the raw reads using FastQC and MultiQC]
3_trim.sh [Trim reads using TrimGalore! at settings determined based on 2_pre-qc results]
4_post-qc.sh [Collate FastQC files from TrimGalore! and run MultiQC on these]
5_align.sh [Align reads and generate sorted BAMs with marked duplicates as follows: BWA MEM > Picard SortSam > Picard MarkDuplicates]
6_combine.sh [Combine reports from 5_align using MultiQC and a custom script for summarizing Picard CollectHsMetrics]
7_gvcf.sh [Generate variant calls using HaplotypeCaller]
8_genomicsDB.sh [Create/update a tumor or host genomicsDB]
9a_genotypeGVCFs.sh [Run GenotypeGVCFs, processing equal bp regions of the genome in parallel]
9b_genotypeGVCFs.sh [Combine the jointly genotyped regions, separate indels and SNPs, and apply basic hard filters]
10_variant_stats.sh [Generate tables and figures for a VCF (this is essential for proper filtering)]
ATOMM.sh [TODO: automate ATOMM run, including subsetting SNPs for interaction test via marginal tests]
gemma.sh [Run GEMMA BSLMM either on all samples or using leave one out blocked xval. TODO: generate pheno predictions, calculate accuracy]