23andMe / yhaplo

Identifying Y-chromosome haplogroups in arbitrarily large samples of sequenced or genotyped men
Other
103 stars 24 forks source link

Run yhaplo on b38 VCF files #17

Closed dwuab closed 3 years ago

dwuab commented 3 years ago

I have a .vcf file of Y SNPs aligned to b38 reference genome. However, I notice several files in the input directory are based on b37 coordinates. Any advice on how to deal with b38 Y SNPs? Liftover from b38 to b37 first and then run yhaplo? Or other workflow? Thanks!

dpoznik commented 3 years ago

I imagine a b38→b37 LiftOver should do the trick.

Alternatively, it looks like ISOGG lists b38 coordinates for all of these SNPs on the spreadsheet linked from this page: https://isogg.org/tree/ISOGG_YDNA_SNP_Index.html

So you could read in the mapping and replace the b37 coordinate values in you local version of yhaplo's input/isogg.* files. One caveat is that input/isogg.2016.01.04.txt has some formatting issues, as it was copied directly from the ISOGG website at the time. That might make it hard to edit. When yhaplo runs, it cleans and processes this file. So the output file output/isogg.snps.unique.2016.01.04.txt may make for a better starting point.

LiftOver is probably easier, if that works :)