ajaynadig / bhr

Suite of heritability and genetic correlation estimation tools for exome-sequencing data
MIT License
31 stars 6 forks source link

Interpret the distinct behaviors of synonymous burden heritability across MAF bins #20

Open Asukayj opened 3 months ago

Asukayj commented 3 months ago

Dear BHR team,

Thanks for previously helping diagnose the issue regarding burden correlation function! Now I met a new issue when analyzing the heritability of my traits.

As mentioned in the paper and in other issues, synonymous variants can be used as negative control as expected and it does make sense. In my results, the distribution of ultra-rare synonymous burden heritability indeed looks like null distribution (MAF<1e-4 for my setting); however, the synonymous heritability from the rare MAF bin (1e-4<MAF<1e-3) seems to be significantly positive and even more heritable than plof in the same MAF cutoff, which may be strange. See the results below:

synonymous burden for 1e-4<MAF<1e-3 that has strange behavior: phe_no h2 h2_se ratio 42 0.0120560451497789 0.00305394747737485 3.94769236835147 43 0.0108853763210799 0.00292529684191412 3.72111854260821 44 0.00711318006609169 0.00251459806870635 2.82875428666464 45 0.00423410954092587 0.00215240761157615 1.96715042176669 46 0.00627451953435611 0.00281719932656859 2.22721888195203

[1] Burden Heritability Regression Daniel Weiner, Ajay Nadig, and Luke O'Connor, 2023 Running BHR at 2024-03-13 08:03:42 For trait 1, MAF ranges from 1e-04 to 0.001 ...seems reasonable Lambda GC: 0.9196 ...seems reasonable 14979 genes in the BHR regression, 0 of which are significant fixed effects ...seems reasonable BHR finished at 2024-03-13 08:04:12 [1] Burden Heritability Regression Daniel Weiner, Ajay Nadig, and Luke O'Connor, 2023 Running BHR at 2024-03-13 08:04:41 For trait 1, MAF ranges from 1e-04 to 0.001 ...seems reasonable Lambda GC: 0.9519 ...seems reasonable 14986 genes in the BHR regression, 0 of which are significant fixed effects ...seems reasonable BHR finished at 2024-03-13 08:05:07 [1] Burden Heritability Regression Daniel Weiner, Ajay Nadig, and Luke O'Connor, 2023 Running BHR at 2024-03-13 08:05:35 For trait 1, MAF ranges from 1e-04 to 0.001 ...seems reasonable Lambda GC: 0.7515 ...seems reasonable 14998 genes in the BHR regression, 0 of which are significant fixed effects ...seems reasonable BHR finished at 2024-03-13 08:06:01 [1] Burden Heritability Regression Daniel Weiner, Ajay Nadig, and Luke O'Connor, 2023 Running BHR at 2024-03-13 08:06:26 For trait 1, MAF ranges from 1e-04 to 0.001 ...seems reasonable Lambda GC: 0.7859 ...seems reasonable 14996 genes in the BHR regression, 0 of which are significant fixed effects ...seems reasonable BHR finished at 2024-03-13 08:06:52 [1] Burden Heritability Regression Daniel Weiner, Ajay Nadig, and Luke O'Connor, 2023 Running BHR at 2024-03-13 08:07:19 For trait 1, MAF ranges from 1e-04 to 0.001 ...seems reasonable Lambda GC: 0.9373 ...seems reasonable 14993 genes in the BHR regression, 1 of which are significant fixed effects ...seems reasonable BHR finished at 2024-03-13 08:07:43

synonymous burden for ultra-rare that indeed looks like null distribution: phe_no h2 h2_se ratio 42 0.000634245862916548 0.00258357040208983 0.24549199913558 43 -0.00106635086888942 0.00245518390542062 -0.434326270441534 44 0.0031640985269652 0.00200769276857674 1.57598741026907 45 0.00577225371860015 0.0024193778018831 2.38584222526444 46 -0.00010233767702685 0.0023530991793828 -0.0434905922893111

[1] Burden Heritability Regression Daniel Weiner, Ajay Nadig, and Luke O'Connor, 2023 Running BHR at 2024-03-13 08:03:42 For trait 1, MAF ranges from 1e-05 to 1e-04 ...seems reasonable Lambda GC: 0.9139 ...seems reasonable 16650 genes in the BHR regression, 0 of which are significant fixed effects ...seems reasonable BHR finished at 2024-03-13 08:04:13 [1] Burden Heritability Regression Daniel Weiner, Ajay Nadig, and Luke O'Connor, 2023 Running BHR at 2024-03-13 08:04:42 For trait 1, MAF ranges from 1e-05 to 1e-04 ...seems reasonable Lambda GC: 0.9447 ...seems reasonable 16650 genes in the BHR regression, 0 of which are significant fixed effects ...seems reasonable BHR finished at 2024-03-13 08:05:09 [1] Burden Heritability Regression Daniel Weiner, Ajay Nadig, and Luke O'Connor, 2023 Running BHR at 2024-03-13 08:05:35 For trait 1, MAF ranges from 1e-05 to 1e-04 ...seems reasonable Lambda GC: 0.7453 ...seems reasonable 16650 genes in the BHR regression, 0 of which are significant fixed effects ...seems reasonable BHR finished at 2024-03-13 08:06:04 [1] Burden Heritability Regression Daniel Weiner, Ajay Nadig, and Luke O'Connor, 2023 Running BHR at 2024-03-13 08:06:26 For trait 1, MAF ranges from 1e-05 to 1e-04 ...seems reasonable Lambda GC: 0.7765 ...seems reasonable 16650 genes in the BHR regression, 0 of which are significant fixed effects ...seems reasonable BHR finished at 2024-03-13 08:06:54 [1] Burden Heritability Regression Daniel Weiner, Ajay Nadig, and Luke O'Connor, 2023 Running BHR at 2024-03-13 08:07:17 For trait 1, MAF ranges from 1e-05 to 1e-04 ...seems reasonable Lambda GC: 0.9291 ...seems reasonable 16650 genes in the BHR regression, 0 of which are significant fixed effects ...seems reasonable BHR finished at 2024-03-13 08:07:43

For comparison, I also post the results for pLoF variants below, which seems to be fine.

plof burden for 1e-4<MAF<1e-3: phe_no h2 h2_se ratio 42 0.00260750627328991 0.00221044131969596 1.17963152880646 43 0.00439655502545361 0.00274201246007019 1.60340446641919 44 0.00650633842448688 0.00269888588734925 2.41074973009591 45 0.00479726243318101 0.00185740352820471 2.58277878788023 46 0.00240094579104089 0.00206990672235405 1.15992946209207

plof burden for ultra rare: phe_no h2 h2_se ratio 42 0.00766599933015673 0.00286614905320313 2.6746687586221 43 0.00997587507226722 0.00299288488760775 3.333197047963 44 0.00631590999441937 0.0029884048972104 2.11347197306332 45 0.00870547997889106 0.00262289081768516 3.31904016751033 46 0.00964920582891363 0.00432473528154199 2.23116681154949

As suggested, I also tried to set 'genomewide_correction=TRUE', but it does not work for my result: synonymous burden for 1e-4<MAF<1e-3 with genomewide correction: phe_no h2 h2_se ratio 42 0.0121104029996435 0.00307260761433261 3.94140890075023 43 0.0109109318176589 0.00292886172581063 3.72531476016984 44 0.00709486477043536 0.00251596719785649 2.81993532208207 45 0.00420825092062223 0.00214053623071131 1.96597976723982 46 0.00627486516788429 0.00281629513233141 2.22805667483073

For more details, I used Regenie to get the summary statistics, all the traits are continuous, and the population size is N = 40K. I appreciate any insight or understanding on the behavior of synonymous variants burden heritability!

Also, results of the ultra-rare variants seem to make sense and give me the confidence to believe them and I did not yet find other issues when checking the calculation process. But I just could not figure out what happened to the synonymous variants between 1e-4<MAF<1e-3 or maybe I missed something. Thanks!

Yijun

ajaynadig commented 3 months ago

Hello, Thank you for your detailed question. Can you please send the full BHR function command that you used, so that I can get a sense of which options you are using?

This is indeed puzzling, and I can't see anything obvious that would cause this result. A few questions for you: 1) Were these individuals sequenced on the same platform? Platform differences can lead to inflated synonymous heritability 2) Is it possible that there is minor-allele-biased stratification in your sample? That is, a population within your cohort that is both enriched for rare variation and different in the phenotypes of interest, perhaps a bottlenecked population?

However, I am skeptical that either of these would cause this pattern of effects, as these should equally effect the rarer MAF bin. If the above suggestions are not at play in your data, perhaps you can send me one of the problematic summary statistics and I'll take a look!

Asukayj commented 3 months ago

Hi,

I used the default mode for estimating heritability and tried the flag 'genomewide_correction = TRUE', see below: bhr_univariate <- BHR(mode = "univariate", trait1_sumstats = sumstats_1, annotations = list(baseline_ori))

bhr_univariate$mixed_model$heritabilities[,"total"] h2 h2_se 0.012056045 0.003053947

bhr_univariate <- BHR(mode = "univariate", trait1_sumstats = sumstats_1, annotations = list(baseline_ori), genomewide_correction = TRUE)

bhr_univariate$mixed_model$heritabilities[,"total"] h2 h2_se 0.012110403 0.003072608

For the questions, sequencing platform may not be the concern. Also note that the upward estimate is only observed for 'synonymous variants' in '1e-4<MAF<1e-3' while the genomewide_correction provides few adjustment, so can I say there is not minor-allele-biased stratification? Or I can look at it if there is a way to do that.

I just sent the summary statistics and look forward to your thoughs!