Closed GoogleCodeExporter closed 8 years ago
I think the main problem with VCF 4.1 is that the "Phred-scaled likelihoods"
(PL) variable is now a vector with non-fixed number of integers and the length
is instead the number of possible genotypes (as explained in
http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-cal
l-format-version-41). Something similar is true for the "Allele count" (AC) and
the "Allele Frequency" (AF) variables. More in general, the Number variable can
now be equal to "A" or "G" but pysam does not seem to like that at the moment.
A dirty workaround is to remove triallelic sites from the VCF file with a
command like this:
sed -e 's/Number=A/Number=1/g' -e 's/Number=G/Number=3/g' input.vcf | awk
'$0~"^#" || $5!~","' > output.vcf
The new file should be parsable by pysam.
Original comment by giulio.g...@gmail.com
on 7 Jun 2012 at 6:25
We have support for this in HEAD in PyVCF:
http://pyvcf.readthedocs.org/en/latest/index.html
Original comment by cas...@gmail.com
on 18 Jun 2012 at 9:59
Thanks, I have reworked the vcf module a little. It should now support more vcf
flavours.
However, it is still experimental.
Best wishes,
Andreas
Original comment by andreas....@gmail.com
on 16 Aug 2012 at 8:00
Original issue reported on code.google.com by
nlomi...@googlemail.com
on 24 Jan 2012 at 1:28