arrogantrobot / 23andme2vcf

convert your 23andme raw file to VCF | DEPRECATED, please see https://github.com/plantimals/2vcf
MIT License
94 stars 30 forks source link

Broken genotypes #15

Open lhelseth opened 7 years ago

lhelseth commented 7 years ago

I keep getting just "0" or "1" for genotype after ~560,000 correct conversions using the 23andme_v4_hg19_ref.txt.gz data and after ~500,000 correct conversions using the 23andme_v4_hg19_ref.txt.gz data using "perl 23andme2vcf.pl <path to 23andme txt.zip file> genome_XYZ.vcf 4 (or 3)". The break happens in both after:

chrX 2689575 rs311150 G A . . . GT 1/1

I've seen this with two different individuals' SNP files, one generated in Feb 2015 and another generated Dec 2016. Both the above rsID and the following one are still listed in the current dbSNP, and both entries in the 23andme_v5_ht19_ref.txt.gz appear valid. I've run this on a laptop and on a Linux server so don't think it's a resource issue. Any suggestions? Thanks.

Larry

arrogantrobot commented 7 years ago

Did you make a 23andme_v5_ht19_ref.txt.gz reference? I've included a v3 and v4. It's possible that leaving indels in a new reference would fail in that way, but I wouldn't expect it to get so far before encountering it's first indel.

lhelseth commented 7 years ago

Rob, No, I just used the v4 (and then the v3, when prompted after running v4). Thanks.

Larry

On Mon, Dec 12, 2016 at 8:51 PM, Rob Long notifications@github.com wrote:

Did you make a 23andme_v5_ht19_ref.txt.gz reference? I've included a v3 and v4. It's possible that leaving indels in a new reference would fail in that way, but I wouldn't expect it to get so far before encountering it's first indel.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/arrogantrobot/23andme2vcf/issues/15#issuecomment-266621067, or mute the thread https://github.com/notifications/unsubscribe-auth/ALKFYSH70WwFeyOzePiwgqmEQYdUvycJks5rHggwgaJpZM4LK3Rx .