ajaynadig / bhr

Suite of heritability and genetic correlation estimation tools for exome-sequencing data
MIT License
31 stars 6 forks source link

Dealing with ultra-rare variants that have been collapsed by SAIGE-GENE #10

Closed oalavijeh closed 1 year ago

oalavijeh commented 1 year ago

Dear Ajay and Dan,

I am now analysing data from a SAIGE-GENE analysis we've done with genomics enlgand data but am getting very high or low lambdas with generally absurdly high h2s (>0.3 which when liability adjusted go above 1) and I was wondering:

  1. if you had to deal with ultra-rare variant information that has been collapsed together e.g.:

image

Some of the markerIDs are extremely long. Initially I separated them out but this gives me lots of variants with exactly the same AC/AF/BETA etc which I thought would be biasing my results so I then just assigned the long ultra_rare tagged markerIDs as a single markerID (the first one in the list).

  1. how do you tell if the betas have been allele standardised? In your wrangling example script you mention "per-allele are the usual betas reported by an exome wide association study" so I used the betas in the file image but this still gave me odd results so I used your script but this equally gave me odd results as described above.

Sorry for all the questions I am keen to get this working but am a relative novice at this!

All the best

Omid

ajaynadig commented 1 year ago

Hi Omid,

If by "collapsed together", you mean that these are summary statistics from burden tests that aggregate across many variants, unfortunately BHR cannot at present handle this input. BHR inputs variant-level association statistics, not collapsed tests. Additionally, SAIGE by default performs logistic regression for continuous traits, which will generate betas that are log(OR), whereas BHR expects linear regression betas. This difference in units makes a difference, and may be contributing to your odd results.

(We recognize that it would be ideal to go directly from SAIGE-GENE to BHR, and are currently working on the idea of a "BHR" mode in SAIGE that outputs summary statistics ready for BHR. However, it will be some time before we can implement this. If you are curious, the primary reasons why this is difficult at present are that 1) SAIGE-GENE uses different burden weights than the BHR default, and 2) SAIGE-GENE does not incorporate null burden statistics as in BHR.)

For the present, we recommend that you calculate case/control betas yourself from the case and control allele frequencies, as we describe in the methods of our paper:

image

P.S. For ultra rare variants, it may not be that unusual to observe many variants with the same AC/AF/BETA. For example, if many of the variants are singletons, there is a very limited number of possible AC/AF/BETA, that could lead to this pattern.

Hope that answers your question--please let me know if not.