Closed FranciscoAscue closed 1 year ago
The GL field is supposed to contain Genotype likelihood. for a diploid individual there should be 3 values for 3 possible genotypes (AA/AB/BB). My guess is that stacks outputs something slightly different overloading the GL field. That makes the field incompatible with downstream tools and the VCF specification. You can remove the GL field from the VCF file with bcftools
bcftools annotate -x FORMAT/GL file.vcf.gz
@tcezard thanks for the advice, running the validator and debugging didn't give me any more problems with GL, but they still give me problems with the following (only Duplicated errors):
According to the VCF specification, the input file is not valid
Warning: A valid 'reference' entry is not listed in the meta section. This occurs 1 time(s), first time in line 3162.
Error: Contig is not sorted by position. This occurs 10927 time(s), first time in line 3163.
Error: Duplicated variant NT_174338.1:4997:A>C found. This occurs 2 time(s), first time in line 3271.
Error: Duplicated variant NT_174393.1:289:C>G found. This occurs 2 time(s), first time in line 3275.
Error: Duplicated variant NT_174582.1:487:C>A found. This occurs 2 time(s), first time in line 3301.
Error: Duplicated variant NT_174766.1:304:G>C found. This occurs 2 time(s), first time in line 3334.
Error: Duplicated variant NT_174766.1:12005:A>C found. This occurs 2 time(s), first time in line 3339.
Error: Duplicated variant NT_174872.1:4926:G>A found. This occurs 2 time(s), first time in line 3392.
Error: Duplicated variant NT_175047.1:3882:A>C found. This occurs 2 time(s), first time in line 3445.
.
.
.
This is part of the vcf
NT_174338.1 4997 5233:76:+ A C . PASS NS=40;AF=0.175 GT:DP:AD:GQ 0/0:1:1,0:21 1/1:1:0,1:14 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 1/1:1:0,1:14 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 1/1:1:0,1:14 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 1/1:1:0,1:14 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 1/1:1:0,1:14 0/0:1:1,0:21 1/1:2:0,2:18 1/1:1:0,1:14 0/0:1:1,0:21 0/0:1:1,0:21
NT_174338.1 4997 5235:13:- A C . PASS NS=40;AF=0.25 GT:DP:AD:GQ 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 1/1:1:0,1:16 1/1:1:0,1:16 0/0:1:1,0:21 0/0:1:1,0:21 1/1:1:0,1:16 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 1/1:1:0,1:16 0/0:1:1,0:21 0/0:1:1,0:21 1/1:1:0,1:16 0/0:1:1,0:21 0/0:1:1,0:21 0/0:1:1,0:21 1/1:1:0,1:16 1/1:1:0,1:16 0/0:2:2,0:25 0/0:1:1,0:21 1/1:1:0,1:16 0/0:1:1,0:21 0/0:1:1,0:21 1/1:1:0,1:16 1/1:1:0,1:16 0/0:1:1,0:21 0/0:1:1,0:21
NT_174393.1 289 5563:262:+ C G . PASS NS=36;AF=0.194 GT:DP:GQ 0/0:1:23 0/0:1:23 0/0:1:23 0/0:1:23 0/0:1:23 0/0:1:23 0/0:1:23 ./.:.:. 0/0:1:23 0/0:1:23 0/0:1:23 0/0:1:23 0/0:1:23 1/1:1:17 1/1:1:17 0/0:2:26 ./.:.:. 0/0:1:23 0/0:1:23 0/0:1:23 0/0:1:23 0/0:1:23 0/0:1:23 ./.:.:. 0/0:1:23 0/0:1:23 1/1:1:17 0/0:1:23 0/0:1:23 0/0:1:23 0/0:1:23 0/0:1:23 1/1:1:17 1/1:1:17 0/0:1:23 0/0:1:23 ./.:.:. 1/1:1:17 1/1:1:17 0/0:1:23
NT_174393.1 289 5564:43:- C G . PASS NS=35;AF=0.229 GT:DP:GQ 0/0:1:21 0/0:1:21 0/0:1:21 0/0:1:21 0/0:1:21 0/0:2:24 0/0:1:21 ./.:.:. 0/0:1:21 0/0:1:21 0/0:1:21 1/1:1:15 0/0:1:21 1/1:1:15 1/1:1:15 0/0:1:21 0/0:1:21 1/1:1:15 0/0:1:21 0/0:1:21 0/0:1:21 0/0:1:21 0/0:1:21 ./.:.:. 1/1:1:15 0/0:1:21 0/0:1:21 1/1:1:15 1/1:1:15 0/0:1:21 0/0:1:21 0/0:1:21 1/1:1:15 ./.:.:. 0/0:1:21 0/0:1:21 ./.:.:. 0/0:1:21 0/0:1:21 ./.:.:.
The input data was filtered by MAF and missing data, but still, have errors.
I worked with stacks2, and I want to submit this data to EBI I have the following errors after applying vcf_validator_linux and vcf_debugulator_linux:
I Don't Know if the useful tools for handling VCF files that recommend on the EBI submissions help page are mandatory because Stacks generate VCF files directly. Any insight about the error above is helpful for us.
PD.
I worked with 40 individuals of Cavia Porcellus from RAD-seq sequencing and use the Scaffolds genome as reference (cavpor3)