Closed M-jk22 closed 2 months ago
Hi M-jk22 There are a subset of binary disease studies (0=controls, 1=cases) where the results have not been generated using logistic regression. Most of the studies generated by the MRC-IEU pipeline (and I believe also the Ben Neale lab) were analysed using bolt-lmm, which is a linear regression model. The beta from this model should not be exponentiated, as this won't have a sensible interpretation. The interpretation of the beta is something like the change in absolute disease risk per unit change in the exposure. To transform to a log odds ratio you can use this formula:
#Convert BOLT LMM effects to log odds
#formula: log OR = beta / (u(1-u)); where u=ncases/(ncases + ncontrol) REPEAT with SE
#Example follows (ukbb_all is a data-frame of GWAS summary statistics):
ukbb_all$ncase <- 52400
ukbb_all$ncontrol <- 372016
ukbb_all$u <- ukbb_all$ncase/(ukbb_all$ncase + ukbb_all$ncontrol)
ukbb_all$beta <- ukbb_all$beta/ (ukbb_all$u * (1 - ukbb_all$u))
ukbb_all$se <- ukbb_all$se / (ukbb_all$u * (1 - ukbb_all$u))
Could you let me know your study's ID, consortium and unit fields? This issue affects all studies with ID starting ukb-b or with consortium field set to MRC-IEU. I think the units field will also be set to something like "absolute risk".
Hi M-jk22 There are a subset of binary disease studies (0=controls, 1=cases) where the results have not been generated using logistic regression. Most of the studies generated by the MRC-IEU pipeline (and I believe also the Ben Neale lab) were analysed using bolt-lmm, which is a linear regression model. The beta from this model should not be exponentiated, as this won't have a sensible interpretation. The interpretation of the beta is something like the change in absolute disease risk per unit change in the exposure. To transform to a log odds ratio you can use this formula:
#Convert BOLT LMM effects to log odds #formula: log OR = beta / (u(1-u)); where u=ncases/(ncases + ncontrol) REPEAT with SE #Example follows (ukbb_all is a data-frame of GWAS summary statistics): ukbb_all$ncase <- 52400 ukbb_all$ncontrol <- 372016 ukbb_all$u <- ukbb_all$ncase/(ukbb_all$ncase + ukbb_all$ncontrol) ukbb_all$beta <- ukbb_all$beta/ (ukbb_all$u * (1 - ukbb_all$u)) ukbb_all$se <- ukbb_all$se / (ukbb_all$u * (1 - ukbb_all$u))
Could you let me know your study's ID, consortium and unit fields? This issue affects all studies with ID starting ukb-b or with consortium field set to MRC-IEU. I think the units field will also be set to something like "absolute risk".
Hello, thank you for your response. The outcome data ID I am using is: ieu-b-4960, and the exposure data ID is: ebi-a-GCST90001470. I have encountered similar situations with other data, for example: ukb-b-6353 and ukb-a-419. Should I use a new method to calculate OR and 95% CI now? Are there parameters in twosampleMR that can help avoid such results?
yes those other outcome studies (ieu-b-4960, ukb-b-6353 and ukb-a-419) are also subject to this "issue". Can you also paste in your response the following data fields: consortium and units. If the units field contains "absolute risk", then the effect size cannot be interpreted as a log odds ratio. You can use the formula in my previous response to transform your MR result into a log odds ratio, then exponentiated that into an odds ratio. Further details on the transformation can be found here: https://data.bris.ac.uk/data/dataset/aed0u12w0ede20olb0m77p4b9
yes those other outcome studies (ieu-b-4960, ukb-b-6353 and ukb-a-419) are also subject to this "issue". Can you also paste in your response the following data fields: consortium and units. If the units field contains "absolute risk", then the effect size cannot be interpreted as a log odds ratio. You can use the formula in my previous response to transform your MR result into a log odds ratio, then exponentiated that into an odds ratio. Further details on the transformation can be found here:
ukb-b-6353: MRC-IEU, SD; ukb-a-419: Neale Lab, SD; ieu-b-4960: UK Biobank, NA; I've also seen these data used in others' research, and I'm confused.
yes those other outcome studies (ieu-b-4960, ukb-b-6353 and ukb-a-419) are also subject to this "issue". Can you also paste in your response the following data fields: consortium and units. If the units field contains "absolute risk", then the effect size cannot be interpreted as a log odds ratio. You can use the formula in my previous response to transform your MR result into a log odds ratio, then exponentiated that into an odds ratio. Further details on the transformation can be found here: https://data.bris.ac.uk/data/dataset/aed0u12w0ede20olb0m77p4b9
Hello, sorry to bother you. Currently, I only know how to use the TwoSampleMR package to conduct MR analyses. I am not sure if TwoSampleMR has a function or parameter to avoid this situation. If not, I will have to disregard this data and use outcome data that can produce normal OR and 95% CI. Thank you for your response.
yes those other outcome studies (ieu-b-4960, ukb-b-6353 and ukb-a-419) are also subject to this "issue". Can you also paste in your response the following data fields: consortium and units. If the units field contains "absolute risk", then the effect size cannot be interpreted as a log odds ratio. You can use the formula in my previous response to transform your MR result into a log odds ratio, then exponentiated that into an odds ratio. Further details on the transformation can be found here: https://data.bris.ac.uk/data/dataset/aed0u12w0ede20olb0m77p4b9
Sorry to bother you again. In the result file from our MR analysis using the mr function, there is a column for b. This is what you referred to as the log OR. So, should I just look at the b value and not the OR? When b > 0, it indicates a risk factor, and when b < 0, it indicates a protective factor. Is my understanding correct?"Certainly
Yes, only the b value will be valid. I would ignore the OR value if there is one. Yes your interpretation makes sense.
Yes, only the b value will be valid. I would ignore the OR value if there is one. Yes your interpretation makes sense.
Thank you for your reply. This is indeed one approach—focusing only on the b value, since, as you mentioned, the b value is unaffected and reliable. However, if I need to conduct a meta-analysis and must use the OR values, then I will really have to give up on this data.
It's easy to transform the b into a log odds ratio using the formula I shared above (the transformation must also be applied to the standard error).
It's easy to transform the b into a log odds ratio using the formula I shared above (the transformation must also be applied to the standard error).
"ukbb_all$ncase <- 52400" and "ukbb_all$ncontrol <- 372016" refer to the number of cases in the outcome file.
log OR In the formula "ukbb_all$beta / (ukbb_all$u * (1 - ukbb_all$u))," does this beta come from the results calculated by mr()?
Yes, the beta (and se) comes from the results generated by mr(). ncase and ncontrol is number of cases and controls in your analysis, which you will have to supply.
Yes, the beta (and se) comes from the results generated by mr(). ncase and ncontrol is number of cases and controls in your analysis, which you will have to supply.
I'm sorry to bother you. I'm quite weak in this area. Is this result correct?
Could you supply me your number of cases, as well as b and se from mr(), and I can check.
Could you supply me your number of cases, as well as b and se from mr(), and I can check.
table.MRresult.csv Thank you very much. The document has been uploaded. Finally, can I get the new OR and 95% CI
Could you supply me your number of cases, as well as b and se from mr(), and I can check.
ncase=37362 ;ncontrol=423174
Could you supply me your number of cases, as well as b and se from mr(), and I can check.
I apologize for the disturbance, and I would be extremely grateful if I could receive a reply.
For exposure 1, the Inverse variance weighted method, I estimated:
b<-0.004257812
se<-0.001605194
ncase<-37362
ncontrol=423174
u <-ncase/(ncase + ncontrol)
b <- b/ (u * (1 - u))
se <- se / (u * (1 - u))
> b
[1] 0.05711689
> se
[1] 0.02153305
>
> exp(b) #OR
[1] 1.05878
> exp(b-1.96*se) #LCI
[1] 1.015024
> exp(b+1.96*se) #UCI
[1] 1.104422
reference for the transformation approach: https://data.bris.ac.uk/data/dataset/aed0u12w0ede20olb0m77p4b9
For exposure 1, the Inverse variance weighted method, I estimated:
b<-0.004257812 se<-0.001605194 ncase<-37362 ncontrol=423174 u <-ncase/(ncase + ncontrol) b <- b/ (u * (1 - u)) se <- se / (u * (1 - u)) > b [1] 0.05711689 > se [1] 0.02153305 > > exp(b) #OR [1] 1.05878 > exp(b-1.96*se) #LCI [1] 1.015024 > exp(b+1.96*se) #UCI [1] 1.104422
Thank you for your reply. The last question needs to be confirmed. ncase and ncontrol belongs to the outcome data, not the information of the exposure data. Thank you so much for your patience.
Yes, ncase and ncontrol is supposed to correspond to the outcome data.
Yes, ncase and ncontrol is supposed to correspond to the outcome data.
Hi,
I hope this message finds you well. I greatly benefited from reading your insightful responses to the previous inquiries. Thank you for your expertise! My situation is similar yet slightly different from the previous individual's.
I've noticed that when using IEU data with IDs starting with ukb-b or within the MRC-IEU consortium field as exposure variables (with outcome data as a binary variable), the MR estimates show very large beta and se values. This situation may distort the calculated OR.
In the previous case, you recommended adjusting beta and se values using the formulas "b <- b / (u (1 - u))" and "se <- se / (u (1 - u))" when IEU data were used as outcome variables, resulting in increased beta and se values post-adjustment.
Therefore, my questions are:
What formulas should I use to correct for this scenario? Are there professional literature or relevant references supporting this data processing method? Thank you for your assistance!
In your case, when using one of those datasets to define the exposure, the issue is one of scaling. The MR result reflects the change in the outcome per 1-unit increase in the exposure. In your example, a 1-unit increase in the exposure can be interpreted as 1 unit change in absolute disease risk, i.e. the proportion of the sample that is a case, which varies from 0 to 1. Imagine the exposure is "risk of coronary heart disease (CHD)". Let's assume there are 50,000 CHD cases, in a cohort of 450,000. That's an absolute risk of CHD of 50,000 in 450,000 or 0.11 (11% probability of being a case). A change in absolute risk of 1 would imply 0.11 + 1 = 1.11, indicating a very large and unrealistic probability of being a case of 110%. The solution is to scale your result to reflect something more realistic. For a normally distributed exposure, the simple solution would be to standardise the exposure into standard deviation units. However, you can't do that here because your exposure is a binary variable. I would scale the MR result to reflect a more realistic change in risk. For example, you could scale your MR result to reflect a 10% change in absolute risk by dividing the beta and SE from the MR output by 10. Whether this is a sensible scale depends on your disease and its prevalence in the population. It might be ok for a common disease like coronary heart disease but probably not for a very rare disease. Whatever scale you choose to use, when the exposure is binary disease status, the MR output should only be used to indicate whether liability to disease is associated with your outcome of interest, and whether this relationship is positive or negative. The actual effect sizes don't have much clinical meaning and shouldn't be taken too seriously.
In your case, when using one of those datasets to define the exposure, the issue is one of scaling. The MR result reflects the change in the outcome per 1-unit increase in the exposure. In your example, a 1-unit increase in the exposure can be interpreted as 1 unit change in absolute disease risk, i.e. the proportion of the sample that is a case, which varies from 0 to 1. Imagine the exposure is "risk of coronary heart disease (CHD)". Let's assume there are 50,000 CHD cases, in a cohort of 450,000. That's an absolute risk of CHD of 50,000 in 450,000 or 0.11 (11% probability of being a case). A change in absolute risk of 1 would imply 0.11 + 1 = 1.11, indicating a very large and unrealistic probability of being a case of 110%. The solution is to scale your result to reflect something more realistic. For a normally distributed exposure, the simple solution would be to standardise the exposure into standard deviation units. However, you can't do that here because your exposure is a binary variable. I would scale the MR result to reflect a more realistic change in risk. For example, you could scale your MR result to reflect a 10% change in absolute risk by dividing the beta and SE from the MR output by 10. Whether this is a sensible scale depends on your disease and its prevalence in the population. It might be ok for a common disease like coronary heart disease but probably not for a very rare disease. Whatever scale you choose to use, when the exposure is binary disease status, the MR output should only be used to indicate whether liability to disease is associated with your outcome of interest, and whether this relationship is positive or negative. The actual effect sizes don't have much clinical meaning and shouldn't be taken too seriously.
Thank you very much for your valuable insights. This has helped me a lot! I believe that the Q&A on this topic will help even more people in the future!
Hi, I hope this message finds you well. I am reaching out to seek your expertise regarding an issue I encountered while using the TwoSampleMR package for my analysis.
In my recent analysis, I observed a situation where the results showed an IVW P-value of less than 0.05. However, the odds ratio (OR) was calculated to be 1.005642193, with a 95% confidence interval of 1.001845102 to 1.009453676. This finding is not an isolated case, as I have encountered similar outcomes in repeated analyses.
I am uncertain if such results are correct or usable, and I am keen to understand the potential reasons behind this occurrence. Your insights or guidance on this matter would be immensely valuable to me.
Thank you very much for your time and consideration. I look forward to your response.