atks / vt

A tool set for short variant discovery in genetic sequence data.
http://genome.sph.umich.edu/wiki/vt
MIT License
191 stars 3 forks source link

Subset should modify multiallelic sites #27

Open AlistairNWard opened 9 years ago

AlistairNWard commented 9 years ago

When subsetting a vcf file, it would be useful to trim all alleles from a multiallelic site that are not present in the samples being subsetted on. For example, consider the following entry

1 100 . CTTT CT,C 100 PASS ... 0/2 ...

The genotype for the sample being subsetted on is 0/2, so when subsetting, this record needs to be retained, but there is no need to retain the 'CT' allele. This also requires all INFO fields with an entry for each alternate allele to be trimmed.

It is possible to use 'vt decompose | vt subset' to get rid of the alternate allele that isn't present, but this will modify the values supplied in the genotype fields, so isn't necessarily a desirable solution.

atks commented 9 years ago

It might be a good idea to add an option in vt subset to perform this.