Closed jaredo closed 6 years ago
I'm going to look further here to understand/recall better the problem, but a quick thing you can try is to use the --special-ploidy
parameter to specify that the chrX is haploid. It might work adding the parameter as --special-ploidy chrX=1
Although I see that you have diploid genotypes in the first 2 samples, is that correct?
Although I see that you have diploid genotypes in the first 2 samples, is that correct?
That is right. For regions where ploidy can vary between samples, such as non-PAR chrX on humans, we need to be flexible with the length of the PL
field. You could infer the expected length of PL
from the ploidy of GT
.
I see. We had the impression that the specification required that all the samples had the same ploidy, but it doesn't actually requires so, and your point makes sense from the biological side. The specification also says that PL is actually expected to have the same ploidy as the GT.
You can expect we will allow this, but we haven't scheduled it yet.
Related discussion taking place in https://github.com/samtools/hts-specs/issues/272
in #114 we introduced a fix for this, the next version of the validator will accept that VCF as valid:
$ cat example.vcf
##fileformat=VCFv4.3
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096 HG00097 HG00098
chrX 10980118 rs1265885 C T 1572 PASS SNVHPOL=2;MQ=60 GT:GQ:GQX:DP:DPF:AD:ADF:ADR:SB:FT:PL 1/1:159:30:54:0:0,54:0,27:0,27:-88.3:PASS:370,163,0 1/1:135:30:46:0:0,46:0,23:0,23:-77.1:PASS:370,138,0 1:224:30:10:0:0,10:0,6:0,4:-24.3:PASS:231,0
$ ./vcf_validator -i example.vcf
[info] Reading from input file...
[info] According to the VCF specification, the input file is valid
$ cat example.vcf.errors_summary.1519743173046.txt
According to the VCF specification, the input file is valid
Warning: A valid 'reference' entry is not listed in the meta section. This occurs 1 time(s), first time in line 3.
Warning: Chromosome/contig 'chrX' is not described in a 'contig' meta description. This occurs 1 time(s), first time in line 3.
Of course, those warnings are there because I didn't include any meta-information, I used a minimal header.
thanks!
Hello! Thanks for your work on this.
I think there is an issue when validating
FORMAT/PL
for non-diploid genotypes. Consider the following region on chromosome X:I think the number of
PL
values for the male haploid sample should be equal to the the number of alleles ie. 2.thanks
Jared