gamazonlab / MR-JTI

MIT License
50 stars 14 forks source link

How to use extrernal tissue #3

Closed montenegrina closed 3 years ago

montenegrina commented 3 years ago

Hello,

can you please tell me how I would implement the tissue which is not part of GTEX in your analysis? I plan to use Retina tissue, and weights for it are provided in RDat format. I understand you need .db format for each tissue?

I was using UTMOST initially to create Retina.db via: python make_sqlite_db.py --output Retina_new.db --betas Retina.weight_new.txt --results Retina.extra.txt --construction Retina.construction.txt --meta Retina.sample.txt

make_sqlite_db.py is here

where Retina.weight_new.txt has 86972032 lines and looks like this:

head Retina.weight.txt rsid gene weight ref_allele eff_allele rs73129264 DPM1 0.0002645231 G T rs4287819 DPM1 0.0002703946 T G rs117447227 DPM1 -0.0002199045 C T rs6125825 DPM1 5.049009e-05 T C ... wc -l Retina.weight.txt 86972032 Retina.weight.txt

head Retina.extra.txt gene genename pred.perf.R2 n.snps.in.model pred.perf.pval pred.perf.qval A1BG-AS1 A1BG-AS1 0.5 3117 1e-10 1e-10 A2M A2M 0.5 4812 1e-10 1e-10 A2M-AS1 A2M-AS1 0.5 4663 1e-10 1e-10 A2ML1 A2ML1 0.5 4681 1e-10 1e-10 ... wc -l Retina.extra.txt 16971 Retina.extra.txt

Note: Retina.extra.txt was required to run make_sqlite_db.py. I created Retina.extra.txt with genes and number of SNPs per gene I have for Retina and for these other columns I put the constant values files for all weight tissues in UTMOST have these values:

retina$pred.perf.R2 <- 0.5 retina$pred.perf.pval <- 1e-10 retina$pred.perf.qval <- 1e-10 head Retina.construction.txt chr cv.seed 1 1782 10 706 11 975 12 978 ... wc -l Retina.construction.txt 23 Retina.construction.txt

Number of samples for Retina is 406 head Retina.sample.txt n.samples 406

Can you please advise what should I do in order to create Retina.db file that I can use with your software?

I found this code: https://github.com/gamazonlab/MR-JTI/blob/master/model_training/predixcan/src/run.sh

I plan to run MR-JTI on my META GWAS results.

In UTMOST I am used to dosage files, here I see you use "plink_files" term. Can you please tell me what would be adequate plink files for my analysis?

I am assuming I would download db. files for all 49 GTEX tissues from here: https://zenodo.org/record/3842289#.X_aEdR17lp8

And can you please share with me codes to run this analysis across all 49 GTEX tissues + Retina?

I am sorry for this basic questions, I am very new to this type of analysis.

zdangm commented 3 years ago

Hi, I am not very clear whether you are interested in model training using JTI, or performing causal inference using MR-JTI, or both. For JTI: JTI is a joint-tissue imputation framework that borrows information from other potential relevant tissues for model training. That means you will need to prepare data (including genotype and expression) from all available tissues as your input files. Notably, the expression level of the retina and all other 49 tissues should be on the same scale. I hope the example files at https://github.com/gamazonlab/MR-JTI/blob/master/model_training/JTI/JTI_example.zip would be helpful for you. You could find the formats for both genotype and expression data there. The .db usually contains pre-trained model information which means the .db file is the output of model training, not input for model training. If your plan is to generate prediction models for Retina tissue using JTI, then you will need to prepare the genotype and expression file for retina, as well as all other available tissues. Because the target tissue (retina) needs to borrow information from other tissues to improve the prediction quality. For MR-JTI: MR-JTI uses SNP's marginal effect rather than the 'weights' from the prediction model. The marginal effect can be downloaded from the GTEx portal. Let me know if you have any other questions. Dan

montenegrina commented 3 years ago

Hello,

Thank you so much for getting back to me and clarifying this.

I would like to run both: JTI and MR-JTI.

So far I was using UTMOST for TWAS but I read that your software (JTI) "JTI borrows information across transcriptomes of different tissues, leveraging shared genetic regulation, to improve prediction performance in a tissue-dependent manner."

and in that regard it is better than UTMOST so I would like to use it.

So to start with JTI I would I guess generate .db and .cov file for each of 49 GTEX tissues and Retina.

To do that should I follow this workflow? https://github.com/gamazonlab/MR-JTI/blob/master/model_training/predixcan/src/run.sh

As you mentioned I would need to obtain genotype and gene expression files for all 49 GTEX tissues and well as for Retina.

For Retina I do have gene expression data and it looks like this:

head GSE115828_DE_analysis.txt ensembl_gene_id external_gene_name strand chromosome_name start_position end_position gene_biotype logFC FC AveExpr t P.Value adj.P.Val B ENSG00000110777 POU2AF1 -1 11 111352252 111455630 protein_coding 0.825519054815263 1.77217251819843 -1.6967531192659 4.3341216126258 1.82431919764831e-05 0.106212068263311 0.0168668362388846

when I look at the example you shared with me the expression file looks like this:

head jti_example_exp.txt tissue sampleid exp exp_w dhs_w Adipose_Subcutaneous S951351 1.9287 1 1 Adipose_Subcutaneous S718426 -0.106 1 1 ...

Do I take AveExpr from my GSE115828_DE_analysis.txt file to be the corresponding "exp"? Also how do I calculate "exp_w" and "dhs_w" for Retina?

Cheers, Ana

gamazonlab commented 3 years ago

Hi Ana, Feel free to email me and Dan at ericgamazon at gmail.com and zdangm at gmail.com. Might be easier to set up zoom. -- E