ajaynadig / bhr

Suite of heritability and genetic correlation estimation tools for exome-sequencing data
MIT License
31 stars 6 forks source link

Run Genetic correlation analysis through different annotation source #19

Closed Richard1F closed 9 months ago

Richard1F commented 9 months ago

Dear BHR team,

Thanks for developing such a useful tool! As I try to make sense of my own analysis, I wonder if it is reasonable to apply the bivariate mode of BHR to evaluate the correlation between two traits in the same annotation-MAF bin but the annotation sources of two summary statistics are different. For example, I want to analyze the genetic correlation between my own traits and the one downloaded from Genebass in terms of ultra-rare LoFs. However, the annotation pipelines are different (distinct sources to derive the LoF variants). Intuitively, it is reasonable since most of them should have overlaps, but I would like to double-check.

Thanks!

Richard

ajaynadig commented 9 months ago

Hi Richard,

Thanks for your kind words and interest in BHR.

Theoretically, this is one of the goals of BHR--to be able to jointly analyze summary statistics from different studies, even if the decisions made during the individual studies were slightly different, as is the case for e.g. LDSC analyses of various common variant GWAS. I would wager that among slightly different pipelines for annotating LoF variants, there won't be differences that lead to substantial changes in BHR output; if I were a reviewer, I would not raise an issue with this analysis, provided that the LoF annotation methods were well-validated.

That being said, if you have some promising results, it may be worthwhile to pursue an analysis to attempt to harmonize the methods. Because variant-level sumstats from Genebass are available, I wonder whether you may be able to perform LoF annotation with your own protocol, prior to aggregating variant-level results into burden test summary statistics. Or, equally, you may consider matching the annotation pile for your in-house data to the genebass methods (from Karczewski et al, 2022, Cell Genomics), and see if the results hold up.

If you do so, please let us know about the results, as it would be helpful for us to know about how BHR behaves with variability in LoF annotation.

Richard1F commented 9 months ago

Hi,

Thanks for your prompt reply and that does make sense. By the way, I also would like to adopt the same annotation procedure as 'Karczewski et al, 2022, Cell Genomics'. However, it appears to me that there are some issues with the LOFTEE plugin for LoF annotation under the GRCh38 branch (although it works well with the GRCh37 branch), so if I want to hormonize the annotation methods, maybe I have to do some liftover, but that will in turn result in some slight difference. And this is also the reason why I choose another pipeline.

But now from your answer, it reminds me that I could obtain all the LoF variants directly from the Genebass (i.e. randomly select a trait, download a pLoF variant-level summary statistics file through hail, and is equivalent to doing the LoF annotation under the same pipeline as in Genebass manually)!

Let me know if there is any issue in my statement and I will continue working on the analysis. Thanks again!

ajaynadig commented 9 months ago

The only concern with that approach is that many of the variants in your dataset may not be observed in the genebass dataset, in which case you would not know the genebass annotation. If the overlap is very good, then this may not be an issue.

Richard1F commented 9 months ago

Thanks for clarification. I will check these details and see the result. I will close the issue for now.