gamazonlab / MR-JTI

MIT License
50 stars 14 forks source link

about MR-JTI input file issue #9

Open forget999 opened 1 year ago

forget999 commented 1 year ago

Hi, thanks for the great approach! I am confused about the preparation of the input file for MR-JTI, I want to know how to make sure that exposure (eQTL in cis regions) and GWAS traits (QTL) use the same allele in the MR-JTI analysis. Is the TwosampleMR R package applicable? Can I acquire GTEx v8 to get effect allele and thus coordinate genetic variation? If possible, I would like you to share some examples for pruning SNPs, for preparation of the input file for MR-JTI. I'm really looking forward to your response, thanks!

forget999 commented 1 year ago

@zdangm @egamazon

zdangm commented 1 year ago

Hi Thank you for your interest! If the eQTL and GWAS gave different effect alleles, you may manually flip the allele as following. df$gwas_beta_fliped = ifelse(df$eqtl_effect_allele == df$gwas_effect_allele, df$gwas_beta, df$gwas_beta * -1) Maybe the twosampleMR could generate the intermediate dataframe for you.

Here is an example for ld clumping using plink.

plink --bfile xxxx --clump xxxxx --clump-field p_eQTL --clump-snp-field rsid --clump-p1 1 --clump-r2 0.1 --out xxxx

Please let me know if you have other questions. Dan

On Sat, Jan 7, 2023 at 7:00 PM Forget @.***> wrote:

@zdangm https://github.com/zdangm @egamazon https://github.com/egamazon

— Reply to this email directly, view it on GitHub https://github.com/gamazonlab/MR-JTI/issues/9#issuecomment-1374438938, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH5HAMJ4WVARKK4DYZHNBJLWRFEEVANCNFSM6AAAAAATTP6YNE . You are receiving this because you were mentioned.Message ID: @.***>

forget999 commented 1 year ago

Hi Thank you for your interest! If the eQTL and GWAS gave different effect alleles, you may manually flip the allele as following. df$gwas_beta_fliped = ifelse(df$eqtl_effect_allele == df$gwas_effect_allele, df$gwas_beta, df$gwas_beta * -1) Maybe the twosampleMR could generate the intermediate dataframe for you. Here is an example for ld clumping using plink. plink --bfile xxxx --clump xxxxx --clump-field p_eQTL --clump-snp-field rsid --clump-p1 1 --clump-r2 0.1 --out xxxx Please let me know if you have other questions. Dan On Sat, Jan 7, 2023 at 7:00 PM Forget @.> wrote: @zdangm https://github.com/zdangm @egamazon https://github.com/egamazon — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH5HAMJ4WVARKK4DYZHNBJLWRFEEVANCNFSM6AAAAAATTP6YNE . You are receiving this because you were mentioned.Message ID: @.>

Thank you very much for your response! I still have some confusion that I would like to have answered by you. Currently, I only have the GTEx v8 eQTL file and GWAS summary file in txt format. As far as I know, using Plink method to clump SNP id requires bim/fam/bed format file instead of txt, so TwosampleMR might be more friendly for my current situation. I would like to know, do I have to use Plink mehod?

  1. For LD score calculation, I am better at the LDSC method (https://github.com/bulik/ldsc) for calculation. But you recommend to use gcta64. I am not sure about the risk that a different approach brings to LDscore. Can I choose another method of calculating LDscore?
  2. Besides MR-JTI, I have some confusion about JTI. The GTEx model calculated by JTI can be further applied to S-PrediXcan to calculate TWAS. I would like to know if I need to perform Harmonization and Imputation between the GTEx model and my GWAS before applying the GTEx model from JTI to S-PrediXcan (e.g. coordinating the orientation of alleles between GTEx and my GWAS). Because I see that using S-PrediXcan alone will Harmonization and Imputation of my GWAS data. As a beginner, I am very interested in JTI and MR-JTI and I would like to use them as the most critical methods in my research. I am very much looking forward to your reply, thanks!
zdangm commented 1 year ago

Hi,

  1. For clumping, a reference dataset also . You can find the genotype files for the 1000 genome project from plink "data source" page. https://www.cog-genomics.org/plink/1.9/resources#phase1 or https://www.cog-genomics.org/plink/2.0/resources#phase3_1kg Note, please use ancestrally matched samples.
  2. Sure. The original ldsc calculates the same ld score for you. They bring no difference from my perspective.
  3. You don't have to worry about the harmonization. Both the python version of S-Predixcan and the R version ( https://github.com/gamazonlab/MR-JTI/blob/master/model_training/predixcan/src/run.sh) will automatically flip the allele for you. Dan

On Sun, Jan 8, 2023 at 6:21 PM Forget @.***> wrote:

Hi Thank you for your interest! If the eQTL and GWAS gave different effect alleles, you may manually flip the allele as following. df$gwas_beta_fliped = ifelse(df$eqtl_effect_allele == df$gwas_effect_allele, df$gwas_beta, df$gwas_beta -1) Maybe the twosampleMR could generate the intermediate dataframe for you. Here is an example for ld clumping using plink. plink --bfile xxxx --clump xxxxx --clump-field peQTL --clump-snp-field rsid --clump-p1 1 --clump-r2 0.1 --out xxxx Please let me know if you have other questions. Dan … <#m-2272883559102944715_> On Sat, Jan 7, 2023 at 7:00 PM Forget @.> wrote: @zdangm https://github.com/zdangm https://github.com/zdangm https://github.com/zdangm @egamazon https://github.com/egamazon https://github.com/egamazon https://github.com/egamazon — Reply to this email directly, view it on GitHub <#9 (comment) https://github.com/gamazonlab/MR-JTI/issues/9#issuecomment-1374438938>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH5HAMJ4WVARKK4DYZHNBJLWRFEEVANCNFSM6AAAAAATTP6YNE https://github.com/notifications/unsubscribe-auth/AH5HAMJ4WVARKK4DYZHNBJLWRFEEVANCNFSM6AAAAAATTP6YNE . You are receiving this because you were mentioned.Message ID: @.*>

Thank you very much for your response! I still have some confusion that I would like to have answered by you. Currently, I only have the GTEx v8 eQTL file and GWAS summary file in txt format. As far as I know, using Plink method to clump SNP id requires bim/fam/bed format file instead of txt, so TwosampleMR might be more friendly for my current situation. I would like to know, do I have to use Plink mehod?

  1. For LD score calculation, I am better at the LDSC method ( https://github.com/bulik/ldsc) for calculation. But you recommend to use gcta64. I am not sure about the risk that a different approach brings to LDscore. Can I choose another method of calculating LDscore?
  2. Besides MR-JTI, I have some confusion about JTI. The GTEx model calculated by JTI can be further applied to S-PrediXcan to calculate TWAS. I would like to know if I need to perform Harmonization and Imputation between the GTEx model and my GWAS before applying the GTEx model from JTI to S-PrediXcan (e.g. coordinating the orientation of alleles between GTEx and my GWAS). Because I see that using S-PrediXcan alone will Harmonization and Imputation of my GWAS data. As a beginner, I am very interested in JTI and MR-JTI and I would like to use them as the most critical methods in my research. I am very much looking forward to your reply, thanks!

— Reply to this email directly, view it on GitHub https://github.com/gamazonlab/MR-JTI/issues/9#issuecomment-1374791123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH5HAMPSD7NQ4HKD2VK5GN3WRKIL5ANCNFSM6AAAAAATTP6YNE . You are receiving this because you were mentioned.Message ID: @.***>

forget999 commented 1 year ago

Thanks for your detailed response! I still have two confusions. 1. When implementing MR-JTI, how do I choose the eQTL in terms of p value? For that, I often choose the eQTL for eqtl_p less than 5e-08, but according to the Mendelian randomization guideline, I also need to make sure that this eQTL is not correlated with outcome. So before implementing MR-JTI, do I need to make sure that the eQTL gwas_p is greater than 5e-08? 2.In the guides and examples of S-Predixcan (https://github.com/hakyimlab/MetaXcan/wiki/Tutorial:-GTEx-v8-MASH-models-integration-with-a-Coronary-Artery-Disease-GWAS), in addition to Harmonization, it is also necessary to "imput" GWAS data. Although the python version and R version of S-Predixcan (https://github.com/gamazonlab/MR-JTI/blob/master/model_training/predixcan/src/run.sh)) solve the problem of allele flipping (Harmonization), but No imputation was performed. Do I need to additionally imput GWAS data based GTEx model? I am very much looking forward to your reply, thanks!

Hi, 1. For clumping, a reference dataset also . You can find the genotype files for the 1000 genome project from plink "data source" page. https://www.cog-genomics.org/plink/1.9/resources#phase1 or https://www.cog-genomics.org/plink/2.0/resources#phase3_1kg Note, please use ancestrally matched samples. 2. Sure. The original ldsc calculates the same ld score for you. They bring no difference from my perspective. 3. You don't have to worry about the harmonization. Both the python version of S-Predixcan and the R version ( https://github.com/gamazonlab/MR-JTI/blob/master/model_training/predixcan/src/run.sh) will automatically flip the allele for you. Dan

zdangm commented 1 year ago

Imputation is always performed before GWAS. S-Predixcan/JTI takes the results of GWAS as input. Most of the time, the GWAS summary statistics provided the association results for imputed variants (typically 3-10 million variants). So you don't have to do additional imputation. Dan

On Mon, Jan 9, 2023 at 11:13 PM Forget @.***> wrote:

Thanks for your detailed response! In the guides and examples of S-Predixcan ( https://github.com/hakyimlab/MetaXcan/wiki/Tutorial:-GTEx-v8-MASH-models-integration-with-a-Coronary-Artery-Disease-GWAS), in addition to Harmonization, it is also necessary to "imput" GWAS data for GWAS data. Although the python version and R version of S-Predixcan solve the problem of allele flipping (Harmonization), but No imputation was performed. Do I need to additionally imput GWAS data based GTEx model?

— Reply to this email directly, view it on GitHub https://github.com/gamazonlab/MR-JTI/issues/9#issuecomment-1375772453, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH5HAMO6GU3SQDFVA4HWIFLWRQTK5ANCNFSM6AAAAAATTP6YNE . You are receiving this because you were mentioned.Message ID: @.***>

forget999 commented 1 year ago

Hi! Sincerely thank you very much for your reply!

I noticed that the JTI GETx model is based on the hg19 reference genome (https://doi.org/10.5281/zenodo.3842289). But currently my GWAS summary data is from hg38. Can you provide the JTI GETx model based on hg38 to be consistent with the reference genome version of the GWAS data (hg38).

In addition, could you provide joint covariance for the multi-tissue test and enable joint test across all tissues so that it can be applied to SMulTiXcan (eg. joint covariance for 49 tissues in GTEx8)? Or I would like to know the more method details for estimating the covariance between tissue-tissue pairs that your mentioned (https://github.com/gamazonlab/MR-JTI/issues/4#issuecomment-787115295).

I'm very sorry to distrub you again! But I really need your help, thanks! @zdangm

Imputation is always performed before GWAS. S-Predixcan/JTI takes the results of GWAS as input. Most of the time, the GWAS summary statistics provided the association results for imputed variants (typically 3-10 million variants). So you don't have to do additional imputation. Dan On Mon, Jan 9, 2023 at 11:13 PM Forget @.> wrote: Thanks for your detailed response! In the guides and examples of S-Predixcan ( https://github.com/hakyimlab/MetaXcan/wiki/Tutorial:-GTEx-v8-MASH-models-integration-with-a-Coronary-Artery-Disease-GWAS), in addition to Harmonization, it is also necessary to "imput" GWAS data for GWAS data. Although the python version and R version of S-Predixcan solve the problem of allele flipping (Harmonization), but No imputation was performed. Do I need to additionally imput GWAS data based GTEx model? — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH5HAMO6GU3SQDFVA4HWIFLWRQTK5ANCNFSM6AAAAAATTP6YNE . You are receiving this because you were mentioned.Message ID: @.>

zdangm commented 1 year ago

Hi

  1. No worries about the build. JTI takes rsid to map variants.
  2. I don't have the joint covariance in hand for the multi-tissue test. I think it can be generated using EUR samples from the 1000 genome project. The covariance matrix I mentioned in the github page is used for GBJ test. I think it is different from the one for SMultiXcan. Dan

On Wed, Jan 11, 2023 at 12:46 PM Forget @.***> wrote:

Hi! Sincerely thank you very much for your reply!

I noticed that the JTI GETx model is based on the hg19 reference genome. But currently my GWAS summary data is from hg38. Can you provide the JTI GETx model based on hg38 to be consistent with the reference genome version of the GWAS data (hg38).

In addition, could you provide joint covariance for the multi-tissue test and enable joint test across all tissues so that it can be applied to SMulTiXcan (eg. joint covariance for 49 tissues in GTEx8)? Or I would like to know the more method details for estimating the covariance between tissue-tissue pairs that your mentioned (#4 (comment) https://github.com/gamazonlab/MR-JTI/issues/4#issuecomment-787115295).

I'm very sorry to distrub you again! But I really need your help, thanks!

Imputation is always performed before GWAS. S-Predixcan/JTI takes the results of GWAS as input. Most of the time, the GWAS summary statistics provided the association results for imputed variants (typically 3-10 million variants). So you don't have to do additional imputation. Dan … <#m-7499254439454340924> On Mon, Jan 9, 2023 at 11:13 PM Forget @.> wrote: Thanks for your detailed response! In the guides and examples of S-Predixcan ( https://github.com/hakyimlab/MetaXcan/wiki/Tutorial:-GTEx-v8-MASH-models-integration-with-a-Coronary-Artery-Disease-GWAS https://github.com/hakyimlab/MetaXcan/wiki/Tutorial:-GTEx-v8-MASH-models-integration-with-a-Coronary-Artery-Disease-GWAS), in addition to Harmonization, it is also necessary to "imput" GWAS data for GWAS data. Although the python version and R version of S-Predixcan solve the problem of allele flipping (Harmonization), but No imputation was performed. Do I need to additionally imput GWAS data based GTEx model? — Reply to this email directly, view it on GitHub <#9 (comment) https://github.com/gamazonlab/MR-JTI/issues/9#issuecomment-1375772453>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH5HAMO6GU3SQDFVA4HWIFLWRQTK5ANCNFSM6AAAAAATTP6YNE https://github.com/notifications/unsubscribe-auth/AH5HAMO6GU3SQDFVA4HWIFLWRQTK5ANCNFSM6AAAAAATTP6YNE . You are receiving this because you were mentioned.Message ID: @.>

— Reply to this email directly, view it on GitHub https://github.com/gamazonlab/MR-JTI/issues/9#issuecomment-1378235385, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH5HAMIBUHJDOFCAD5HAMWTWRY3K3ANCNFSM6AAAAAATTP6YNE . You are receiving this because you were mentioned.Message ID: @.***>