chr1swallace / coloc

Repo for the R package coloc
138 stars 44 forks source link

error in check_dataset: duplicated SNPs #128

Open PyunJung-Min opened 11 months ago

PyunJung-Min commented 11 months ago

Hi, thanks for the amazing R package. I am new to COLOC.

I'm trying to integrate GWAS summary data (dataset 1) and eQTL summary data (dataset 2). If i understood correctly, SNPs in dataset 1 and dataset 2 should be identical, is this right?

So i merged dataset 1 and dataset 2 by rsid. However, there are multiple ENSG genes matched to one SNP in eQTL summary data. So, the merged data (dataset1 and dataset 2) has many duplicated SNPs with differnet ENSG genes.

How can i deal with this problem? or am I wrong with dataset editing?

Many thanks in advance

Jungmin

chr1swallace commented 11 months ago

You need to analyse each gene separately, after all you are testing a separate colocalisation hypothesis for each gene.

-- https://chr1swallace.github.io


From: PyunJung-Min @.> Sent: Friday, August 11, 2023 5:12:13 AM To: chr1swallace/coloc @.> Cc: Subscribed @.***> Subject: [chr1swallace/coloc] error in check_dataset: duplicated SNPs (Issue #128)

Hi, thanks for the amazing R package. I am new to COLOC.

I'm trying to integrate GWAS summary data (dataset 1) and eQTL summary data (dataset 2). If i understood correctly, SNPs in dataset 1 and dataset 2 should be identical, is this right?

So i merged dataset 1 and dataset 2 by rsid. However, there are multiple ENSG genes matched to one SNP in eQTL summary data. So, the merged data (dataset1 and dataset 2) has many duplicated SNPs with differnet ENSG genes.

How can i deal with this problem? or am I wrong with dataset editing?

Many thanks in advance

Jungmin

— Reply to this email directly, view it on GitHubhttps://github.com/chr1swallace/coloc/issues/128, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAQWR2DW2EQDJVJW4SNFALLXUWWJ3ANCNFSM6AAAAAA3MMBY44. You are receiving this because you are subscribed to this thread.Message ID: @.***>

PyunJung-Min commented 11 months ago

Thanks for your prompt reply!!

Though my eQTL summary data has 19250 genes. Is there a smart way to analyse 19250 genes at once, instead of performing "coloc.abf" 19250 times?

Thanks!

Jung-Min

chr1swallace commented 10 months ago

sorry, no. but you probably don't want to run 19250 genes. You know whether each of them have a significant signal in your region of interest, so can discard the rest

PyunJung-Min commented 10 months ago

Thank you for the answer! :)

My goal using COLOC is identifying causal(target) genes by integrating GWAS summary data for disease and eQTL summary data. I like to select target genes with various p-value thresholds. That's why i tried to run COLOC with all 19250 genes..

Could you please advise how to solve this mission? I would appreciate any comment:) Many thanks

Jung-Min