arrogantrobot / 23andme2vcf

convert your 23andme raw file to VCF | DEPRECATED, please see https://github.com/plantimals/2vcf
MIT License
94 stars 30 forks source link

MT reference incorrect #14

Open tylerwmarrs opened 8 years ago

tylerwmarrs commented 8 years ago

It appears that you have used UCSC's reference hg19 instead of NCBI's GRCh37 to build your own reference. Normally this is fine, but there are differences between these builds at the chromosome MT. For example, looking at UCSC genome browser for MT:150 shows a T instead of a C.

You can use bcftools to validate against a reference.

bcftools norm -ce -f /reference/homo.sapiens/GRCh37/Homo_sapiens_assembly19.fasta 23andme.vcf

arrogantrobot commented 8 years ago

Thank you for noticing. I will address this shortly.

tylerwmarrs commented 8 years ago

If you are interested, I created my own web service that uses a full indexed FASTA reference to convert to VCF. It determines the appropriate reference version to use and is pretty fast. One thing I noticed is that your custom reference is missing around 8k variants on my 23 and Me data.

Anyways here is the link if you would like to check it out:

http://23converter.tylermarrs.com