ajaynadig / bhr

Suite of heritability and genetic correlation estimation tools for exome-sequencing data
MIT License
31 stars 6 forks source link

Can bhr be used to compare heritability enrichment for different variant categories in one gene? #14

Closed hoangthienan95 closed 1 year ago

hoangthienan95 commented 1 year ago

I'm looking at different types of missense variants in one gene INSR (gain Cys/Met vs not). When I run the code below with trait1_sumstats = test_file (example file provided in the Github), everything works fine, but when I run it with trait1_sumstats = sumstats_gainCysMet, I get error:

Code:

devtools::install_github("ajaynadig/bhr")
library("tidyverse")
sumstats_gainCysMet<- read_tsv("path/to/categorical_120_both_sexes_1_-low_0_high_1e-05-GainCysMet.txt")
sumstats_notCysMet<- read_tsv("path/to/categorical_120_both_sexes_1_-low_0_high_1e-05-NotGainCysMet.txt")
baseline_model <- read.table("path/to/ms_baseline_oe5.txt")
test_file <- read.table("path/to/github_ms_variant_ss_400k_pLoF_group1_21001NA_BMI_munged.txt")
bhr::BHR(mode = "univariate", 
    trait1_sumstats = sumstats_notCysMet,
    annotations = list(baseline_model))

Error

Burden Heritability Regression
Daniel Weiner, Ajay Nadig, and Luke O'Connor, 2023
Running BHR at 2023-07-12 10:49:47
For trait 1, MAF ranges from 1.3e-06 to 8.9e-06
...seems reasonable
Lambda GC:  0.02356
Lambda GC seems low. Please check input columns against documentation
1 genes in the BHR regression, 0 of which are significant fixed effects
...seems reasonable
Error in t(X[block == b, ]) %*% y[block == b] : non-conformable arguments

Ignoring that there might not be enough heritability captured in such few SNPs which I assume was detected by the Lambda GC (454 for sumstats_gainCysMet and 54 for sumstats_notCysMet), the only obvious difference between my files and the test_file is the number of genes.

I'm pretty unfamiliar with R and the error seems cryptics so I can't find the line that is causing the error (google says "attempt to multiply two matrices but the number of columns in the left matrix does not match the number of rows in the right matrix")

I have uploaded all the files here in case it's helpful

categorical_120_both_sexes1-low_0_high_1e-05-GainCysMet.txt categorical_120_both_sexes1-low_0_high_1e-05-NotGainCysMet.txt

ajaynadig commented 1 year ago

Hello, Thanks for your interest in BHR.

BHR is, at it's core, a regression of effect sizes across genes. Specifically, BHR estimates the slope that relates gene-wise burden effect sizes to burden scores (~ related to combined allele frequency).

As a result, BHR cannot compute the heritability for individual genes--when you tried to do so, BHR attempted to fit a regression to one data point, and failed.

There is a quick-and-dirty approach you can use to get heritability estimates that you seek. If you run BHR on all genes, and include the INSR variants only from the group of interest (e.g. two separate runs, one with the INSR GainCysMet estimates and one with the INSR NotGainCysMet estimates), and supply the argument fixed_genes = c("INSR") (also potentially including other very large effect genes), you will then have gene-wise heritability estimates in the significant genes section of the output. However, I will caution you that we have never used this approach in our own work, and I can make no guarantees on it's appropriateness for your scientific questions.

Moreover, BHR is, in general, designed to make general statements about all genes versus specific statements about particular genes. You may consider instead using a software like SAIGE-GENE+, where you could run INSR burden tests with the different sets of variants, and convert the resulting effect sizes estimates into heritability estimates.

Hope that helps!

ajaynadig commented 1 year ago

Closing issue, please let us know if you have any additional questions