Atkinson-Lab / Tractor

Scripts for implementing the Tractor pipeline
MIT License
44 stars 5 forks source link

Example Hail code for binary traits #7

Closed secastel closed 2 years ago

secastel commented 3 years ago

Hi, Thanks for creating a very useful method. I was wondering if you have example Hail code (similar to Tractor-Example-GWAS.py / Tractor-Example-GWAS.ipynb) for running Tractor on binary traits? The issue is that the hl.agg.linreg() Hail function used in the example code doesn't have an equivalent function for logistic regression. There is the hl.logistic_regression_rows() function, but it only allows a single predictor (x) to be used, thus it's not possible to also include the haplotype counts or the non-index allele dosage. Of course one could implement this outside of Hail, but if you already have a solution it would be easier. Any insight would be very helpful.

Thanks, Stephane

eatkinson commented 3 years ago

Hi Stephane, Yes you are correct, there is not functionality for logistic regression on entries in Hail yet, though they are working on this feature - will keep the wiki updated with info on that front. As alternatives, we have built non-Hail options, including joint-model implementations in R and plink. @nievergeltlab - can you add examples of the plink2 implementation you have tested to this repo and link to here?

secastel commented 3 years ago

Thanks for getting back to me. I would be very helpful if you could provide some example code for R and plink. I'll keep on the lookout for a Hail implementation of the required function.

eatkinson commented 3 years ago

Hi Stephane,

Adam would have the most up to date code for this. Adam, could you send an example of the plink2 joint model implementation and/or add it to the repo?

Thanks so much! Elizabeth

On Wed, Jun 16, 2021 at 5:33 PM Stephane Castel @.***> wrote:

Thanks for getting back to me. I would be very helpful if you could provide some example code for R and plink. I'll keep on the lookout for a Hail implementation of the required function.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eatkinson/Tractor/issues/7#issuecomment-862774368, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVEYNEQ4BN76EPBZ2IZOUTTTERFFANCNFSM457UNVIQ .

eatkinson commented 3 years ago

Alternatively you can also use the plink deconvolved tract option that we describe in the Wiki. The results will be very similar, though you will not run the Tractor joint model. Will ping Adam again about adding plink2 examples. Thanks! Elizabeth

eatkinson commented 2 years ago

Hi again Stephane!

To have alternative options described here for others who may have had similar questions:

As of writing this, Hail doesn't have logistic regression implemented on entries yet, though the developers tell me this is in the works. Since we have multiple Xs in the model (rather than just the one that is used in traditional GWAS) we can't run GWAS on just rows, we have to run it on the row-by-individual 'entry'. So the Tractor Hail code initial steps are annotating in an entry array for each person at each spot in their genome for all the relevant terms in the joint model. Since there isn't logistic regression on entries in Hail yet, we suggest the following as alternatives:

1) run logistic regression on the deconvolved VCF files in plink - code for that is provided at the bottom of the GWAS wiki page: https://github.com/eatkinson/Tractor/wiki/Step-3:-Tractor-GWAS. The results from running this on each partial VCF for each individual ancestry are very equivalent to what you would have gotten for each term out of the joint model.

2) Just do linear regression and then transform. Depending on what you're doing, it's close enough to linear regression in terms of p-values and effect estimates. There is quite a lot of literature that justifies this method (logistic and normal distributions are approximately the same after a scale transformation). Here is an example in the context of GWAS, designed to work for SNPs with small effects.: https://www.nature.com/articles/ejhg2016150#Sec2

effect logistic = effect linear / ( (intercept linear) / (1- intercept linear))

Hope this helps! Elizabeth

secastel commented 2 years ago

Thanks for the detailed followup Elizabeth! We've been running it just using linear regression for now. We'll revisit this once Hail is updated with the necessary logistic regression function.