ajaynadig / bhr

Suite of heritability and genetic correlation estimation tools for exome-sequencing data
MIT License
31 stars 6 forks source link

Heritability and lambda GC too high #18

Closed jhylwq123 closed 9 months ago

jhylwq123 commented 10 months ago

Hi, I tired to run BHR in univariate mode using a VCF file containing qualified pLOF(predicted loss of function) variants from 200 cases and 200 controls. The log file indicates Lambda GC is too high and the heritability seems to high to be true.

log file

s1_plof_bhr_model <- BHR(trait1_sumstats = s1_sumstats_plof, annotations = list(baseline_model), #baseline model including constraint annotations num_blocks = 100, #number of blocks for jackknife mode = "univariate") Burden Heritability Regression Daniel Weiner, Ajay Nadig, and Luke O'Connor, 2023 Running BHR at 2023-09-19 16:29:10 For trait 1, MAF ranges from 0.0013 to 0.3 MAF ranges over more than two orders of magnitude for trait 1. We recommend splitting into finer MAF bins. See documentation Lambda GC: 2.184 Lambda GC seems high. Please check input columns against documentation 3956 genes in the BHR regression, 5 of which are significant fixed effects ...seems reasonable BHR finished at 2023-09-19 16:29:12 print(paste0("S1 pLOF Heritability: ", round(s1_plof_bhr_model$mixed_model$heritabilities[1,5] bp_scalingfactor, 4), " ", "(", round(s1_plof_bhr_model$mixed_model$heritabilities[2,5] bp_scalingfactor, 4), ")")) [1] "S1 pLOF Heritability: 0.4726 (0.0932)"

My questions are (1)Is there any methods in BHR to control the lambda ? (2)Can the heritability resulted be trusted? (3)Is it necessary to change the baseline_model downloaded from the website?

ajaynadig commented 10 months ago

Hello,

1)
A high lambda GC suggests that there is a suspiciously large amount of signal in your summary statistics, most likely due to confounding. While BHR controls for some flavors of confounding, it is not perfect; here are some example situations where one could observe inflated heritability:

2) Between the high lambda GC, the wide MAF range, and the relatively low N (for example, in our main paper, we barely had significant heritabilities for schizophrenia and bipolar disorder at N = ~ 10k cases and 10k controls), I think that this heritability estimate is suspect.

Please note that a great way to QC BHR analyses is to repeat the analysis using only synonymous variants. Synonymous variants are an amazing internal control; if you observe 0 synonymous burden heritability but nonzero pLoF burden heritability, this would greatly increase my confidence in your analysis. Otherwise, if you observe significantly nonzero synonymous burden heritability, that is a good sign that something has gone wrong in your analysis.

3) No, it is generally not necessary to change the baseline_model downloaded from the website, save for situations where you want to change gene nomenclature.

Hope that helps!

jhylwq123 commented 10 months ago

Thank you so much for these valuable suggestions! I checked my data and did another round of quality control to decrease the lambda GC. However, there was barely any improvement. QC steps

  1. All participants are from the same population. I used KING(cut off 0.0884) to exclude relatedness.
  2. All variants had an average depth of >30X and genotype rate >98%.
  3. Only include variants MAF <0.01.

Then I run BHR to calculate heritability using only synonymous variants(~12000 genes)

s1_bhr_syn_model <- BHR(trait1_sumstats = s1_sumstats_syn, annotations = list(baseline_model), #baseline model including constraint annotations num_blocks = 1000, #number of blocks for jackknife mode = "univariate", genomewide_correction = TRUE) Burden Heritability Regression Daniel Weiner, Ajay Nadig, and Luke O'Connor, 2023 Running BHR at 2023-09-23 14:31:51 For trait 1, MAF ranges from 0.0013 to 0.0088 ...seems reasonable Running GWC based on all genes Mean genome-wide per allele effect size: 0.0092 Lambda GC: 2.144 Lambda GC seems high. Please check input columns against documentation 12503 genes in the BHR regression, 0 of which are significant fixed effects ...seems reasonable print(paste0("S1 syn Heritability: ", round(s1_bhr_syn_model$mixed_model$heritabilities[1,5] bp_scalingfactor, 4), " ", "(", round(s1_bhr_syn_model$mixed_model$heritabilities[2,5] bp_scalingfactor, 4), ")")) "S1 syn Heritability: 0.55 (0.221)"

Something must be wrong in my analysis, but I confused which step went wrong. I know that sample size is a major concern in my analysis, but it can't explain the inflated heritability in synonymous variants right?

ajaynadig commented 9 months ago

Hello, Apologies for the delayed response. Although the point estimate looks inflated, I will note that a confidence interval for that estimate would include virtually all plausible values (95%: 0.12 to 0.98; 99%: -0.02 to 1.11). Notably, performing those QC steps has increased the standard error of your estimate substantially, and the estimate now seems not incompatible with sampling variation.

Unfortunately, it seems that datasets of this size are underpowered for BHR analyses, unless the trait of interest has an extraordinary genetic architecture. You may consider performing your exome-wide association study in a continuous biomarker for which a large dataset is available, if that is possible for your disease of interest.

Closing the comment now--please feel free to re-open if you have additional questions!