Kurt-Hetrick / RANDOM

random scripts that I either write for people or that I routinely use
0 stars 0 forks source link

SELECT_SAMPLES_FROM_MS_VCF.sh: DRAGEN version. fix their VCF so it passes validations. #22

Closed Kurt-Hetrick closed 4 months ago

Kurt-Hetrick commented 5 months ago

for MT calls, dragen has a format field named, LAF, which is supposed to have a float for each alternate allele present at a site.

FORMAT=

however, that's not what happens in the VCF. It only has the fraction for the alleles SEEN IN THAT SAMPLE. or possibly only for the allele called in the VCF for that sample (haven't looked at everything to see which it is). Doesn't really matter, because this invalidates both the Number (A, which codes for a value per allele, whether it is present in the sample or not). and the description. If it is really only for the allele called for that sample, then Number should equal "1", if is for alleles seen then Number should equal ".". To make this validate so that programs that validate vcf spec, can change the header so that number is ".", but I really don't like changing the description.

Since Number=A, when trimming alternate alleles in bcftools this will cause a crash b/c it is expecting the number of fields to equal the number alleles. So if Number is really supposed to be "1", then whatever. Description is wrong, I don't care. However, if it is reporting fractions for alleles SEEN then trimming doesn't occur for this tag and the fractions listed could be absolutely wrong, especially if a fraction is calculated for an allele that is trimmed out of the VCF when subsetting samples. I'm going to start with Number = 1, if it crashes, then I'll change Number to equal "."

Kurt-Hetrick commented 4 months ago

done