chr1swallace / coloc

Repo for the R package coloc
144 stars 44 forks source link

Little confusion about required coloc inputs #73

Closed Filago4 closed 2 years ago

Filago4 commented 2 years ago

Hello!

I am using coloc with

  1. "snp"
  2. "pos"
  3. "beta"
  4. "varbeta"
  5. type = "cc" for dataset 1 and
  6. "snp",
  7. "pos",
  8. "beta",
  9. "varbeta",
  10. type = "quant"
  11. a given "sdy" for dataset 2. No error or warning occurs and everything seems fine so far :D

However, in some online documentations I also find that I have to give "N", "MAF" (in one or both datasets or a general MAF for both) and especially in dataset 1 (case-control) "s", which is the proportion of cases. However, no warning informs me about those 3 missing and even If i supply them, the outcome does not change.

My question: Are my inputs sufficient for colocalization analysis and can I leave out MAF, s and N (I am a big fan of short and simple code :) )?

Many thanks for your help :)

chr1swallace commented 2 years ago

Please see http://chr1swallace.github.io/coloc/articles/a02_data.html

https://chr1swallace.github.io


From: Filago4 @.> Sent: Saturday, January 15, 2022 2:50:03 AM To: chr1swallace/coloc @.> Cc: Subscribed @.***> Subject: [chr1swallace/coloc] Little confusion about required coloc inputs (Issue #73)

Hello!

I am using coloc with

  1. "snp"
  2. "pos"
  3. "beta"
  4. "varbeta"
  5. type = "cc" for dataset 1 and
  6. "snp",
  7. "pos",
  8. "beta",
  9. "varbeta",
  10. type = "quant"
  11. a given "sdy" for dataset 2. No error or warning occurs and everything seems fine so far :D

However, in some online documentations I also find that I have to give "N", "MAF" (in one or both datasets or a general MAF for both) and especially in dataset 1 (case-control) "s", which is the proportion of cases. However, no warning informs me about those 3 missing and even If i supply them, the outcome does not change.

My question: Are my inputs sufficient for colocalization analysis and can I leave out MAF, s and N (I am a huge fan of short and simple code :) )?

Many thanks for your help :)

— Reply to this email directly, view it on GitHubhttps://github.com/chr1swallace/coloc/issues/73, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAQWR2DUCANBZ7E6RED3QO3UWDOFXANCNFSM5MAJXQIQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Filago4 commented 2 years ago

Many thanks for this link! From the explanations there I see that N, MAF and s are only required in case-control designs to estimate the standard deviation of the trait, but that they are not necessary if I have given beta and varbeta. Is that correct?

The reason for my confusion is e.g. this other documentation here: https://www.quantargo.com/help/r/latest/packages/coloc/5.1.0/coloc.abf or here: https://github.com/cran/coloc/blob/master/R/claudia.R

where it says: "Some of these items may be missing, but you must give always

  1. type
  2. if type=="cc" s
  3. if type=="quant" and sdY known sdY
  4. if type=="quant" and sdY unknown beta, varbeta, N, MAF and then either
  5. pvalues, MAF
  6. beta, varbeta

This is a little bit confusing as "2." implicates that I would need "s" when using type = "cc". However from your link, I conclude that "s" is not needed if beta and varbeta are available.

Am I understanding sth. wrong? And one short other question. My eQTL data (GTEx) is normalized. If I check the expression matrices from GTEx and calculate sdy for a given gene (just by calculating the standard deviation over all samples), it is about 1.0 and after plotting it looks like a fine normal distribution. Is it right to assume then sdy =1 ?

chr1swallace commented 2 years ago

The link is the correct reference and yes sdy=1 for gtex

https://chr1swallace.github.io


From: Filago4 @.> Sent: Saturday, January 15, 2022 10:52:46 AM To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; Comment @.> Subject: Re: [chr1swallace/coloc] Little confusion about required coloc inputs (Issue #73)

Please see http://chr1swallace.github.io/coloc/articles/a02_data.html https://chr1swallace.github.io____ From: Filago4 @.> Sent: Saturday, January 15, 2022 2:50:03 AM To: chr1swallace/coloc @.> Cc: Subscribed @.> Subject: [chr1swallace/coloc] Little confusion about required coloc inputs (Issue #73https://github.com/chr1swallace/coloc/issues/73) Hello! I am using coloc with 1. "snp" 2. "pos" 3. "beta" 4. "varbeta" 5. type = "cc" for dataset 1 and 6. "snp", 7. "pos", 8. "beta", 9. "varbeta", 10. type = "quant" 11. a given "sdy" for dataset 2. No error or warning occurs and everything seems fine so far :D However, in some online documentations I also find that I have to give "N", "MAF" (in one or both datasets or a general MAF for both) and especially in dataset 1 (case-control) "s", which is the proportion of cases. However, no warning informs me about those 3 missing and even If i supply them, the outcome does not change. My question: Are my inputs sufficient for colocalization analysis and can I leave out MAF, s and N (I am a huge fan of short and simple code :) )? Many thanks for your help :) — Reply to this email directly, view it on GitHub<#73https://github.com/chr1swallace/coloc/issues/73>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAQWR2DUCANBZ7E6RED3QO3UWDOFXANCNFSM5MAJXQIQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you are subscribed to this thread.Message ID: @.>

Many thanks for this link! From the explanations there I see that N, MAF and s are only required in case-control designs to estimate the standard deviation of the trait, but that they are not necessary if I have given beta and varbeta. Is that correct?

The reason for my confusion is e.g. this other documentation here: https://www.quantargo.com/help/r/latest/packages/coloc/5.1.0/coloc.abf or here: https://github.com/cran/coloc/blob/master/R/claudia.R

where it says: "Some of these items may be missing, but you must give always

1.

type

2.

if type=="cc" s

3.

if type=="quant" and sdY known sdY

4.

if type=="quant" and sdY unknown beta, varbeta, N, MAF and then either

5.

pvalues, MAF

  1. beta, varbeta

This is a little bit confusing as "2." implicates that I would need "s" when using type = "cc". However from your link, I conclude that "s" is not needed if beta and varbeta are available.

Am I understanding sth. wrong? And one short other question. My eQTL data (GTEx) is normalized. If I check the expression matrices from GTEx and calculate sdy for a given gene (just by calculating the standard deviation over all samples), it is about 1.0 and after plotting it looks like a fine normal distribution. Is it right to assume then sdy =1 ?

— Reply to this email directly, view it on GitHubhttps://github.com/chr1swallace/coloc/issues/73#issuecomment-1013661175, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAQWR2EMV5LZ6TEGNVF24L3UWFGX5ANCNFSM5MAJXQIQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you commented.Message ID: @.***>

Filago4 commented 2 years ago

Many thanks, Then i keep my code in the "minimal version" :) About gtex: Actually the Standard deviation for the traits ist not 1.00 if you calculate it from gtex expression matrices with simple sd() function in R, but sth close to it. e.g. 0.98 in Most cases or in worst case 0.7. Would you say it is better to work with each genes exact sd and not with the theoretical (1.0) one ?

chr1swallace commented 2 years ago

I would work with the exact sd if you know it