Same rsID assigned to two different SNPs

matteofloris commented 2 years ago

I converted the summary statistics of the dataset GCST90001585 (from GWAS catalog) with gwas2vcf. Then, I used the vcf to perform a colocalization test with gwasglue (https://mrcieu.github.io/gwasglue/articles/colocalisation.html) of this dataset against itself (just for debugging). But I received an error:

vres <- coloc::coloc.abf(vout[[1]], vout[[2]]) Error in check_dataset(d = dataset1, 1) : dataset 1: duplicated snps found

The error is caused by the presence of many duplicated rsIDs. Here an example from the VCF file produced by gwas2vcf:

1 2632053 rs1297622263 T C . PASS AF=0.0251 etc... 1 2632057 rs1297622263 C T . PASS AF=0.0003 etc...

As you can see, the positions are different, but the rsID assigned by GWAS2VCF is the same. Looking at dbSNP:

1 2632030 rs1297622263 CTGACAGCCTGAAACAGCACCCTTCACCTTCAGGTGAGAATA C . . etc...

it is clear how the rsID of this large deletion was assigned to two different SNPs by mistake. Why is this happening?

mcgml commented 2 years ago

Hi @matteofloris

Thank you for raising this. Could you please provide the gwas2vcf command used to generate the VCF? And the link to the dbSNP VCF file used for annotation?

Thanks Matt

matteofloris commented 2 years ago

Dear Matt,

here the command: /usr/bin/python3.8 main.py --data test.csv --json params.json --id test --ref human_g1k_v37.fasta --dbsnp dbsnp.v153.b37.vcf.gz --out prova.vcf

the dbSNP file link is the following: http://fileserve.mrcieu.ac.uk/dbsnp/dbsnp.v153.b37.vcf.gz

the summary statistics I am using lack the SNP ID column, maybe this could be a problem? all the best Matteo

Il giorno mer 29 giu 2022 alle ore 14:06 Matthew Lyon < @.***> ha scritto:

Hi @matteofloris https://github.com/matteofloris

Thank you for raising this. Could you please provide the gwas2vcf command used to generate the VCF? And the link to the dbSNP VCF file used for annotation?

Thanks Matt

— Reply to this email directly, view it on GitHub https://github.com/MRCIEU/gwas2vcf/issues/76#issuecomment-1169899750, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZ5JNHQC2AAXWGQHGCMRQ3VRQ34VANCNFSM52CXVBCA . You are receiving this because you were mentioned.Message ID: @.***>

--

Matteo Floris, PhD Associate Professor - Medical Genetics Department of Biomedical Sciences University of Sassari (Italy) Tel +39.333.48.57.679 email: @., @.

Tutto quello che non so l'ho imparato a scuola (Leo Longanesi)

mcgml commented 2 years ago

Thank you @matteofloris. This is due to the upstream spanning deletion which had not been accounted for in the query. I have fixed the code with commit f4790a502e9c2ab2322322f45fa86a3540a20e02.

Best wishes Matt

matteofloris commented 2 years ago

Thank you so much Matt.

Il giorno mer 29 giu 2022 alle ore 14:40 Matthew Lyon < @.***> ha scritto:

Thank you @matteofloris https://github.com/matteofloris. This is due to the upstream spanning deletion which had not been accounted for in the query. I have fixed the code with commit f4790a5 https://github.com/MRCIEU/gwas2vcf/commit/f4790a502e9c2ab2322322f45fa86a3540a20e02 .

Best wishes Matt

— Reply to this email directly, view it on GitHub https://github.com/MRCIEU/gwas2vcf/issues/76#issuecomment-1169930407, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZ5JNF7O2QV3G7GOVZX3JTVRQ72RANCNFSM52CXVBCA . You are receiving this because you were mentioned.Message ID: @.***>

--

Matteo Floris, PhD Associate Professor - Medical Genetics Department of Biomedical Sciences University of Sassari (Italy) Tel +39.333.48.57.679 email: @., @.

Tutto quello che non so l'ho imparato a scuola (Leo Longanesi)

MRCIEU / gwas2vcf

Same rsID assigned to two different SNPs #76