FelixKrueger / SNPsplit

Allele-specific alignment sorting
http://felixkrueger.github.io/SNPsplit/
GNU General Public License v3.0
52 stars 20 forks source link

Relax genome preparation filtering criteria #8

Closed FelixKrueger closed 7 years ago

FelixKrueger commented 7 years ago

Historically, we used to filter out any positions from VCF files where the alternative allele was not defined as as a single base (probably a relic from the days when there was one VCF file for a single strain). For the current mouse genomes project VCF file this seems overly harsh though since there may be different strains that are homozygous for different bases but at the same position.

Here is an example:

chr  //  pos  //  REF  //   ALT  //  GT strain1 // GT strain2  // GT strain3
 1      135446     G        A,T          0/0           2/2           1/1

Here all three strains would be homozygous compared to the reference, strain1 would have the same sequence as the reference, i.e. G/G, strain2 would be T/T and strain3 would be A/A. Can we please include these multiple variants as valid positions for the genome preparation.

FelixKrueger commented 7 years ago

I have now added support for multiple homozygous variants to the genome processing. Added in 3dba9a0b337001482e24f0f247ac1bfdcfda2cea.