Open etal opened 7 years ago
FWIW, the pysam developers are currently working to improve the VCF parsing, so if you have a wishlist you might open a ticket there.
Good to know. This isn't so much a wishlist item as a lack of initiative on my part to ensure round-tripping from/to an arbitrary VCF using pysam maintains all the original info.
In one approach (fmt="vcf-simple"), use pandas instead of pysam to parse the VCF as a tabular file, but don't further parse the INFO and sample columns. This lets us read an arbitrary VCF, manipulate / subset it as an array of loci, and write it to another VCF file. Also, keep the complete VCF header in the GenomicArray.meta attribute, and use it when writing the object out.