USF-HII / snptk

USF HII SNP Toolkit - Analyze and translate SNP entries using NCBI dbSNP and related databases
GNU General Public License v3.0
0 stars 1 forks source link

Parse GRCh37 vcf files into flat file for digestion into snptk #7

Closed j2moreno closed 4 years ago

j2moreno commented 4 years ago

https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz

j2moreno commented 4 years ago

GRCh37 vcf file has chromosomes listed as:

$ bcftools query -f '%ID %CHROM %POS\n' tmp/data/grch37_vcf-03-11-2020/GCF_000001405.25.gz | head

rs775809821 NC_000001.10 10019
rs1008829651 NC_000001.10 10043
j2moreno commented 4 years ago

Fields needed so that snptk can probably read file:

bcftools query -f '%ID %CHROM %POS\n' <input_file>

will be used to extract necessary fields

j2moreno commented 4 years ago

https://github.com/USF-HII/snptk/commit/182fac2733b3eb36b1ba550a6eb994cd9b40be09

j2moreno commented 4 years ago

GRCh37 VCF chromosomes are not given correctly.

$ zcat tmp/data/dbsnp-GRCh37.gz | head
rs775809821 NC_000001.10 10019
rs978760828 NC_000001.10 10039
rs1008829651 NC_000001.10 10043
rs1052373574 NC_000001.10 10051
rs1326880612 NC_000001.10 10051
rs768019142 NC_000001.10 10055

Additional script needed to map snps to correct chromosome

j2moreno commented 4 years ago

https://github.com/USF-HII/snptk/commit/9a696454e05c9081f5d7f0c9abdbae961eb58a3e