arrogantrobot / 23andme2vcf

convert your 23andme raw file to VCF | DEPRECATED, please see https://github.com/plantimals/2vcf
MIT License
94 stars 30 forks source link

hg18 reference #20

Open tikacp opened 6 years ago

tikacp commented 6 years ago

hi rob,

i tried using your script for converting the 23andme data from the personal genome project. it appears that data is still aligned to hg18. could you proivde the corresponding reference or let me know how to compile it myself (like, do you have a script that derives it from the ncbi fasta files?)

thanks tim

tikacp commented 6 years ago

for know i worked around the problem via liftOver (see below), but i still get ~30k site that were not included. it might be better to get a proper reference from you, if you don't mind adding the support for hg18. what i did:

unzip ref files

for f in *_ref.txt.gz; do gunzip $f; done

convert to bed format (loosing allele info!)

for f in *_ref.txt; do awk 'BEGIN{FS="\t";OFS="\t"}{print $1,$2,$2+1,$3}' $f >${f/.txt/.bed}; done

liftOver to hg18

for f in *.bed; do ../liftOver/liftOver $f ../liftOver/hg19ToHg18.over.chain.gz ${f/hg19/hg18} ${f/.bed/.unmapped.bed}; done

re-convert to txt format and add allele info (which is assumed not to change) from hg19

for f in *_hg18_ref.bed; do awk 'BEGIN{FS="\t";OFS="\t"}{print $1,$2,$4}' $f | paste - <(cut -f4 ${f/hg18_ref.bed/hg19_ref.txt}) >${f/.bed/.txt}; done

re-zip ref files

for f in *_ref.txt; do gzip -9 $f; done