add a bcftools split to pipeline mt vcf

Kurt-Hetrick / JHG_Clincal_Exome_Pipeline

0 stars 0 forks source link

add a bcftools split to pipeline mt vcf #53

Closed Kurt-Hetrick closed 2 days ago

Kurt-Hetrick commented 2 weeks ago

bcftools norm -m- -o Mutect2_split.vcf -O z

Kurt-Hetrick commented 2 weeks ago

not sure if this affects metrics generated from this vcf

Kurt-Hetrick commented 2 weeks ago

haplogrep might be affected as well

Kurt-Hetrick commented 1 week ago

first test piping former final output to bcftools (to avoid creating a new intermediate file). If can use a pipe, then do a pipeline run to see if haplogrep and metrics crash or not and/or if their output is different if they do complete successfully.

Kurt-Hetrick commented 1 week ago

piping from gatk to bcftools works, but i would have to upgrade the gatk version to use a gatk 4 version. Also, I think that gatk would produce a non-zero code b/c there is an error that it can't write the tribble index. However the vcf file is produced on the /dev/stdout stream which bcftools can work on and since bcftools can run successfully the final exit code should be zero. Even though there is some jank here, I still think I would prefer this option since I can use the same workflow as opposed to adding another step and readjusting all of the job dependencies in the pipeline. Open question is if tools that run on this output (haplogrep and picard) fail or not since there is no tribble index generated. If they cannot or if there is a collison (picard might generate one, but haplogrep might not) then I would have to add a step anyways.

Kurt-Hetrick commented 1 week ago

even newer versions of gatk (like the one used for the mito calling component of the pipeline) only have the unable to write tribble index as a warning as opposed to an error now, so the exit code should be zero. so a little less janky. still depends on whether or not a tribble index is needed or not in general.

Kurt-Hetrick commented 6 days ago

haplogrep is not affected. took a sample that had a passing multiallelic site and the resulting md5 hash was the same between the normalized and un-normalized input

Kurt-Hetrick commented 6 days ago

a tribble index is required to run picard (for vcf metrics)...I think I'm going to append a tribble index creator to the normalization task using a conditional statement as opposed to creating a new task...at least that's my current plan

Kurt-Hetrick commented 2 days ago

done