Open tikacp opened 6 years ago
for know i worked around the problem via liftOver (see below), but i still get ~30k site that were not included. it might be better to get a proper reference from you, if you don't mind adding the support for hg18. what i did:
for f in *_ref.txt.gz; do gunzip $f; done
for f in *_ref.txt; do awk 'BEGIN{FS="\t";OFS="\t"}{print $1,$2,$2+1,$3}' $f >${f/.txt/.bed}; done
for f in *.bed; do ../liftOver/liftOver $f ../liftOver/hg19ToHg18.over.chain.gz ${f/hg19/hg18} ${f/.bed/.unmapped.bed}; done
for f in *_hg18_ref.bed; do awk 'BEGIN{FS="\t";OFS="\t"}{print $1,$2,$4}' $f | paste - <(cut -f4 ${f/hg18_ref.bed/hg19_ref.txt}) >${f/.bed/.txt}; done
for f in *_ref.txt; do gzip -9 $f; done
hi rob,
i tried using your script for converting the 23andme data from the personal genome project. it appears that data is still aligned to hg18. could you proivde the corresponding reference or let me know how to compile it myself (like, do you have a script that derives it from the ncbi fasta files?)
thanks tim