Closed bguo068 closed 1 year ago
We are going to use bam files from the process GATK_APPLY_BQSR
The issue is more like a problem for making pseudo samples than one for extensively modifying the pipeline.
fq.gz
files using seqtk sample
fastq_map.tsv
table with these pseudo samples' fq.gz files
fastq_map.tsv file
by python, R or excel--coverage_only
to allow the pipeline to stop after generating the bedtools
coverage file --coverage_only true
is specified in the nextflow command line, the pipeline will stop when coverage is calculated. No further snp calling steps will be executed.@Matthew-A-epi
close with the commit above
copied from https://github.com/umb-oconnorgroup/plasmodium_snp_call_pipeline_wdl/issues/3
Add support for subsampling BAM files to determine if sequencing was performed sufficiently. More specifically, the final goal is to determine metrics like %genome coverage or % 5x genome coverage as a function of the subsampled read number.
Bash code used in another pipeline that uses seqtk to subsample the given number of reads for generation of BAM files:
Subsetting for rarefaction analysis (500K, 1M, 3M, 5M, 10M, 20M, 30M)
for i in 500000 1000000 2000000 3000000 5000000 10000000 15000000 20000000 25000000 30000000
Below is code used to create recalibrated coverage files from the resulting output using bedtools.