ajaynadig / bhr

Suite of heritability and genetic correlation estimation tools for exome-sequencing data
MIT License
31 stars 6 forks source link

NaNs produced in output for genetic correlation #11

Closed cjfei18 closed 1 year ago

cjfei18 commented 1 year ago

Dear BHR team, I encountered problems that are similar to which is mentioned in Issue 6 when calculating genetic correlations, and I really appreciate your quick and efficient response. After I tried with the updated rg function by setting num_null_conditions and use_null_conditions_rg, the results did get more reasonable. But somehow there were still warnings for some traits:

1: In sqrt(h2_trait1_jackknife * h2_trait2_jackknife) : NaNs produced
2: In sqrt(jackknife_heritabilities_covariancematrix[1, 1] + (genetic_covariance_random$intercept *  : NaNs produced

Besides, P value of the genetic correlation is not presented in output, and I wonder how to calculate the P value of genetic correlation according to the output, as some traits in Fig 5a of the paper had a rg_se greater than rg. Thanks a lot and looking forward to your reply!

ajaynadig commented 1 year ago

Hi cjfei18,

Thank you for your interest in BHR.

Those NaNs arise in a line of code that takes the square root of heritability estimates for each of the 100 jackknife blocks into which we divide the genome, in order to produce a jackknife estimate of the rg standard error. The NaNs probably arise when some of the block-wise estimates of heritability are negative, which can happen in regression-based estimators such as BHR. Long story short: this error should not affect your point estimates. Thank you for raising this situation, which may affect other users as well. We should probably replace this jackknife estimator with a delta method estimator, which would not suffer from this problem. @danjweiner @lukejoconnor

As far as P values, we have not implemented a method for computing p values within BHR, although this is something we may do in the future. You may consider implementing a simple wald test by dividing the estimate by the standard error, and then calculating the p value using the CDF of the standard normal distribution, e.g.

p_value = 2 * pnorm(rg/rg_se)

Taking care to adjust the lower.tail argument appropriately depending on the sign of your rg estimate.

Larger SEs than point estimates can certainly happen in the setting of low power, as you observed in Figure 5 at our paper. At present, we are unfortunately seeing that the sample sizes of of typical exome-sequencing studies may be underpowered for burden genetic correlation analysis. This often arises in rg estimation, as rg is very sensitive to estimation error in the heritabilities, especially for traits with low heritability (i.e. noisy small denominator -> big variability of correlation).

Hope that helps!

ajaynadig commented 1 year ago

Hi cjfei18--I'm going to close this issue, but please feel free to reopen if you still have any questions.

ajaynadig commented 5 months ago

cjfei18--I just want to alert you to a small bug fix I made to the implementation of the delta method estimator, which may have lead to small downwards bias of the standard error calculation (for example, in the UKB height/BMI comparison, the fix changed the subthreshold rg standard error from 0.07 to 0.10). The issue is fixed now, and I would recommend re-downloading the package and re-analyzing your data.