Closed timothymillar closed 2 months ago
This can be more or less achieved with vcfallelicprimitives
from vcflib. It drops a lot of metadata and doesn't make use of the PS tag but it's probably a better option than writing anther tool.
Reopening this as vcfallelicprimitives
is a bit limited. It would be good to convert more of the metadata from wide to long format.
There are some more difficult things to carry over like the MCMC QC metrics. It would also be nice if we could some how carry over variant depths but that would require storing an array of SNP depths in the original output which would need to be optional if included at all. Perhaps there could be a general --snp-metrics
option to include arrays of more detailed output on single SNP positions?
Done in v0.10.0
See #18 for description of wide vs long format. Currently the assemble program outputs wide format VCF files i.e. each line contains a full haplotype block. This is the most suitable output for the tool giving posterior probabilities etc for full haplotypes.
Long format VCF files (phased SNPs) would be useful and these can be generated by "atomizing" the haplotypes in the wide format VCF. This process will likely result in removal of some information relating to the full haplotype.