Closed Asukayj closed 1 year ago
Hi Yijun, This issue is a little hard to troubleshoot just based on the output--would you be able to share two of the summary statistic files so that I can reproduce and diagnose the issue? My email is anadig [at] broadinstitute [dot] org.
Best, Ajay
Thanks so much and I have just sent the files to this address!
Any update on this closed thread? I found the same problem.
Hi Charley--I replied to @Asukayj yesterday, see below for my message:
I think I've diagnosed the issue. Bear with me, it is a little technical:
In a regression-based estimator like BHR, colinearity can cause signal to "leak" from one term into another. In the case of BHR, we noticed that the intercepts from the pLoF models were often greater than the synonymous models (see Supp Fig 8, first set of bar plots from left), suggesting that this was happening. To address this, we added "null burden statistics" to the regression; this is described in the methods paragraph that begins "Second, we incorporate null burden statistics that effectively fix the BHR intercept and ameliorate bias in its slope...", and in Supplementary Figure 8 from our published manuscript, which shows the synonymous and pLoF intercepts getting closer as we include more and more sets of null burden statistics in the regression.
It was less obvious how to implement this for genetic correlation, so the current rg method does not use null burden statistics for heritability or covariance estimation. At the time, it was unclear how much this would matter in practice, and it didn't seem to affect our results very much. However, this appears to make a big difference for the traits you are analyzing.
The non-significant heritability from the rg function is likely the cause of the odd rg estimates; in the setting of small/nonsignificant h2, rg estimates are generally very unstable.
I will discuss this issue with my collaborators, but I believe that a quick fix is to simply use null burden statistics for heritability but not covariance estimation within the rg function. To that end, I have changed the code on github to include an addition flag "use_null_conditions_rg". You can use this flag as follows:
bhr_cor<-BHR(mode = "bivariate", trait1_sumstats = d_sumstats, trait2_sumstats = v_sumstats, annotations = list(baseline_ori), num_null_conditions = 5, use_null_conditions_rg = TRUE)
At a quick glance, it looks like the BHR rg estimates are now more reasonable for the sumstats you provided (although unfortunately still a bit underpowered, which is in line with our analyses in this N range; see bipolar + schizophrenia analyses from our paper). Can you download the updated BHR functions and let me know if this works for you? Thanks!
Thanks, ajaynadig! It makes prefect sense! This happens when heritability is samll and non significiant. I will try the new codes. Thanks again for the help.
Closing this issue for now, feel free to reopen if other issues arise.
Dear BHR team,
I am now trying to estimate the genetic correlations across traits and encounter some problems regarding the output. I want to estimate the genetic correlations between 100 phenotype pairs for pLoFs and here is the code.
Below is the result (first 30 pairs for example).
There seems to be so many large 'SE's and NAs. All of the traits are quantitative and the sample size for my variant-level summary statistics is about 25k. I have included all the singleton sites for estimating.
Below is the warning information
I have checked my input files and not found the solutions yet. Let me know if there may be some issues I have not noticed. Thank you in advance!
Best, Yijun