Open tylerwmarrs opened 8 years ago
Thank you for noticing. I will address this shortly.
If you are interested, I created my own web service that uses a full indexed FASTA reference to convert to VCF. It determines the appropriate reference version to use and is pretty fast. One thing I noticed is that your custom reference is missing around 8k variants on my 23 and Me data.
Anyways here is the link if you would like to check it out:
It appears that you have used UCSC's reference hg19 instead of NCBI's GRCh37 to build your own reference. Normally this is fine, but there are differences between these builds at the chromosome MT. For example, looking at UCSC genome browser for MT:150 shows a T instead of a C.
You can use bcftools to validate against a reference.
bcftools norm -ce -f /reference/homo.sapiens/GRCh37/Homo_sapiens_assembly19.fasta 23andme.vcf