etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
559 stars 166 forks source link

tabio: Implement generic SNV/SV/CNV "vcf" writer #231

Open etal opened 7 years ago

etal commented 7 years ago

In one approach (fmt="vcf-simple"), use pandas instead of pysam to parse the VCF as a tabular file, but don't further parse the INFO and sample columns. This lets us read an arbitrary VCF, manipulate / subset it as an array of loci, and write it to another VCF file. Also, keep the complete VCF header in the GenomicArray.meta attribute, and use it when writing the object out.

kyleabeauchamp commented 7 years ago

FWIW, the pysam developers are currently working to improve the VCF parsing, so if you have a wishlist you might open a ticket there.

etal commented 7 years ago

Good to know. This isn't so much a wishlist item as a lack of initiative on my part to ensure round-tripping from/to an arbitrary VCF using pysam maintains all the original info.