Closed jhylwq123 closed 9 months ago
Hello,
1)
A high lambda GC suggests that there is a suspiciously large amount of signal in your summary statistics, most likely due to confounding. While BHR controls for some flavors of confounding, it is not perfect; here are some example situations where one could observe inflated heritability:
2) Between the high lambda GC, the wide MAF range, and the relatively low N (for example, in our main paper, we barely had significant heritabilities for schizophrenia and bipolar disorder at N = ~ 10k cases and 10k controls), I think that this heritability estimate is suspect.
Please note that a great way to QC BHR analyses is to repeat the analysis using only synonymous variants. Synonymous variants are an amazing internal control; if you observe 0 synonymous burden heritability but nonzero pLoF burden heritability, this would greatly increase my confidence in your analysis. Otherwise, if you observe significantly nonzero synonymous burden heritability, that is a good sign that something has gone wrong in your analysis.
3) No, it is generally not necessary to change the baseline_model downloaded from the website, save for situations where you want to change gene nomenclature.
Hope that helps!
Thank you so much for these valuable suggestions! I checked my data and did another round of quality control to decrease the lambda GC. However, there was barely any improvement. QC steps
Then I run BHR to calculate heritability using only synonymous variants(~12000 genes)
s1_bhr_syn_model <- BHR(trait1_sumstats = s1_sumstats_syn, annotations = list(baseline_model), #baseline model including constraint annotations num_blocks = 1000, #number of blocks for jackknife mode = "univariate", genomewide_correction = TRUE) Burden Heritability Regression Daniel Weiner, Ajay Nadig, and Luke O'Connor, 2023 Running BHR at 2023-09-23 14:31:51 For trait 1, MAF ranges from 0.0013 to 0.0088 ...seems reasonable Running GWC based on all genes Mean genome-wide per allele effect size: 0.0092 Lambda GC: 2.144 Lambda GC seems high. Please check input columns against documentation 12503 genes in the BHR regression, 0 of which are significant fixed effects ...seems reasonable print(paste0("S1 syn Heritability: ", round(s1_bhr_syn_model$mixed_model$heritabilities[1,5] bp_scalingfactor, 4), " ", "(", round(s1_bhr_syn_model$mixed_model$heritabilities[2,5] bp_scalingfactor, 4), ")")) "S1 syn Heritability: 0.55 (0.221)"
Something must be wrong in my analysis, but I confused which step went wrong. I know that sample size is a major concern in my analysis, but it can't explain the inflated heritability in synonymous variants right?
Hello, Apologies for the delayed response. Although the point estimate looks inflated, I will note that a confidence interval for that estimate would include virtually all plausible values (95%: 0.12 to 0.98; 99%: -0.02 to 1.11). Notably, performing those QC steps has increased the standard error of your estimate substantially, and the estimate now seems not incompatible with sampling variation.
Unfortunately, it seems that datasets of this size are underpowered for BHR analyses, unless the trait of interest has an extraordinary genetic architecture. You may consider performing your exome-wide association study in a continuous biomarker for which a large dataset is available, if that is possible for your disease of interest.
Closing the comment now--please feel free to re-open if you have additional questions!
Hi, I tired to run BHR in univariate mode using a VCF file containing qualified pLOF(predicted loss of function) variants from 200 cases and 200 controls. The log file indicates Lambda GC is too high and the heritability seems to high to be true.
log file
My questions are (1)Is there any methods in BHR to control the lambda ? (2)Can the heritability resulted be trusted? (3)Is it necessary to change the baseline_model downloaded from the website?