Closed Kurt-Hetrick closed 2 days ago
not sure if this affects metrics generated from this vcf
haplogrep might be affected as well
first test piping former final output to bcftools (to avoid creating a new intermediate file). If can use a pipe, then do a pipeline run to see if haplogrep and metrics crash or not and/or if their output is different if they do complete successfully.
piping from gatk to bcftools works, but i would have to upgrade the gatk version to use a gatk 4 version. Also, I think that gatk would produce a non-zero code b/c there is an error that it can't write the tribble index. However the vcf file is produced on the /dev/stdout stream which bcftools can work on and since bcftools can run successfully the final exit code should be zero. Even though there is some jank here, I still think I would prefer this option since I can use the same workflow as opposed to adding another step and readjusting all of the job dependencies in the pipeline. Open question is if tools that run on this output (haplogrep and picard) fail or not since there is no tribble index generated. If they cannot or if there is a collison (picard might generate one, but haplogrep might not) then I would have to add a step anyways.
even newer versions of gatk (like the one used for the mito calling component of the pipeline) only have the unable to write tribble index as a warning as opposed to an error now, so the exit code should be zero. so a little less janky. still depends on whether or not a tribble index is needed or not in general.
haplogrep is not affected. took a sample that had a passing multiallelic site and the resulting md5 hash was the same between the normalized and un-normalized input
a tribble index is required to run picard (for vcf metrics)...I think I'm going to append a tribble index creator to the normalization task using a conditional statement as opposed to creating a new task...at least that's my current plan
done
bcftools norm -m- -o Mutect2_split.vcf -O z