gaow / neuro-twas

Development code (private repo) for TWAS related multiomic gene-mapping in Alzheimer's disease data
6 stars 0 forks source link

Lack of justification of using a sparse model for prediction and new structure of story. #28

Closed hsun3163 closed 3 years ago

hsun3163 commented 3 years ago

The relationships between SNVs and observed Traits can be briefly summarized as follow:

Screen Shot 2020-11-12 at 2 48 58 PM

A GWAS measured the total association between all the SNVs and the traits. A TWAS instead captured all the association of the type A SNV, and a proportion of type B SNVs. That said, the association explained by a perfect TWAS shall be a subset of
the association explained by a perfect GWAS.

However, GWAS is not perfect. The reason of using TWAS for prediction, as justified in the Fusion paper, is that there are some SNVs whose contributing toward the traits is too small yet its effect toward expression is significant enough( filter by Sig. hsq). So using their effect toward expression to amplify the relationships between the SNVs and the traits seems to make sense. As pointed out by the author of fusion, `

"when the cis expression of a gene is driven entirely by a single eQTL $i$, the resulting test statistic will be equal to the raw GWAS association of $i$."

However, both in our sparse models and the top1 that FUSION used, small effect that are used to justify FUSION are ignored, and therefore, if we want to still do prediction, we may need to find another justification for using sparse model. Indeed, a swift scan on the eqtl research cited by fusion as "alternative approaches" focus on identifying the eqtl that are associated with snps that are significant in GWAS .

At this point, maybe it make sense to reformatted our result into one such study along the line of:


"Among a list of genes that are implicated GWAS related SNVs that are associated with Alzheimer disease, how are their potential eQTL associated with disease in each tissue of interest? Is the common eQTL associated with expression in both tissues also be most associated with the disease? if not, what is the reason?"

The data can be linked in such format:

  1. The TWAS P suggest a relatively significant relationship of expression of genes and tissues.
  2. the distribution of model indicating quite a bit of the genes have their expression driven by a couple SNVs

so we want to know what are those SNVs causing the expression of those genes, and are those SNVs tissues specific.

  1. the hsq estimated by FUSION and susie revealed that indeed most of the heritability of the genes owe to 1 or 2 SNVs (Each CS stands for 1 SNVs), the result in 2 is confirmed, so what are those SNVs?

  2. It turn outs that certain genes are rather tissues specific(33/76), but (32/76) of the genes have their expression control by eQTL in all three tissues and 11 of them are controlled in two tissues.

4.1. for all the across tissues case, 31/43 of them are controlled by 1 common SNVs. Those SNVs are likely to be .... and their GWAS stats are .... (To be mapped, shall be simple)

4.1.1. for 4 out of 5 genes with complete common SNVs also have significant TWAS. Among the four, one gene SCIMP is of particular interest: it is TWAS significant in DLPFC, but sparse model failed to identify a SNVs driven its expression. On the contrary, where its eqtl are identify its expression is not significantly associated with Alz.

4.2. among the tissues which dont have overlaps, it turns out that their TWAS are not quite significant either

  1. Since there is uncertainty on what exactly is the SNVs in 5, a downstream multivariate analysis is desirable to narrow down the cs window.
  2. Now that we have some interesting genes we shall annotated them by functions and literature search.
hsun3163 commented 3 years ago

The aim of the research can be outlined as to answer the following questions:

If the impact is too weak to be detected, is aggregating them via molecular phenotypes could help to produce the prediction?

What information might be missed from univariate analysis(Conventional TWAS), but can be fixed via multivariate

Why used new model? for more power.

A summary following this structure and the answer of this issue is outlined in issue #29 . This ticket is hereby closed.