gaow / neuro-twas

Development code (private repo) for TWAS related multiomic gene-mapping in Alzheimer's disease data
6 stars 0 forks source link

Project aims and action plans #1

Open gaow opened 4 years ago

gaow commented 4 years ago

This ticket outlines what I have in mind for now to get us started. I might continue to edit this post as I can think of more to add or reorganize. So please check back on this from time to time.

@hsun3163 for starters, please click on the watch button at the top right of the repo to receive notifications for new tickets opened.

In between the lines below, I'll add some TODO boxes for you to complete and check-off.

TWAS background & warming up

A starting point is the FUSION package. In addition to being a software meta package it is also a nice resource to find TWAS reference papers, and to download small test data-set to learn about the analysis.

Methods-wise, my two high-level suggestions are 1) view them as variable selection (VS) tasks in regression and try to view differences between methods from a quantitative genetic point of view when you think if an assumption made by a VS makes sense in the context of genetics. 2) Understand how it works with the so-called "GWAS summary statistics"

Please slack me if you have any questions about details in those papers. You dont have to understand all papers at once. You can get a rough idea for now and re-read them as you work on the project.

Our project

At Columbia Neurology we have multi-omics data from brains for thousands of individuals. This is terrific resource because as you'll learn from those papers in Background section, the multi-omics molecular phenotypes can be tissue / cell type specific. Since diseases pathology are also likely tissue specific --- eg Alzheimer's disease (AD) and brain tissues --- it would make the most sense to train a prediction model on our brain multi-omics data, and use that to map neurological disease associations.

Get some TWAS done

Here is a rough analysis outline:

All analysis have to be made into SoS pipelines nicely documented. There are some codes I have as jump-start branch in this repository, as your starting point. We can talk about that branch in our meeting.

Above analysis are just conventional TWAS. But technically can be challenging to work with different molecular phenotypes. To name a few challenges i can think of now:

  1. Understanding what they are and their file format
  2. What range of cis regulating genotypes to consider
  3. What adjustment should we make to the model to account for confounders etc

When you read the reference papers particularly practical papers, you should keep these questions in mind and find answers to them

Do something novel

We can talk about these more noval analysis after above are done

  1. Some molecular phenotypes might be related, or share some regulator variants. Instead of estimating the weights one phenotype at a time, how about try estimating them jointly?
  2. What about we use both genotypes around an analysis unit (eg a gene) and the predicted molecular phenotypes to test for disease associations?
  3. If we have enough data, how about we try achieve 1 using approaches such as deep learning? (need literature research and prototyping). I guess one obvious draw back is that it is not clear how to use summary statistics in this context.
gaow commented 4 years ago

@hsun3163 thanks I saw your TWAS collection. Sorry I didn't realize it was not obvious from that page. There is this line on the FUSION website:

Expression weights were typically computed from BLUP, BSLMM, LASSO, Elastic Net ...

This is actually the methods I was referring to that you should read about. Those other papers are nice but they might have slightly different focuses (colocalization, fine-mapping, mediation analysis) which we will also learn about, but down the road. Those methods papers above can help you understand the basic model and math.

A note on Elastic Net: this is actually the predixcan paper (Haky Im 2016?? Nature Genetics) so please use that paper as methods reference for Elastic Net. They also have a version S-Predixcan for using summary statistics please include that paper too.