FredHutch / target-data-viz

Shiny app for visualizing TARGET pediatric AML data
2 stars 2 forks source link

Addition of Differential Gene Expression Analysis Module #77

Open Logan-Wallace opened 2 months ago

Logan-Wallace commented 2 months ago

Collaborators have asked if we could implement a module for performing differential analysis.

After speaking with Dan T., we believe that we can do so in the existing application without bogging the app down for other users.

It would run on the data already hosted in the application and allow for comparisons based on fusion subtypes, mutation status or other patient metadata such as age or gender.

Dan T. suggests reviewing this article on scoping to make sure that each user session data is private to that user - https://rna-qctool.fredhutch.org/

Logan-Wallace commented 2 months ago

https://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

Logan-Wallace commented 2 months ago

Differential Expression Analysis in Shiny

Purpose

To allow the shiny application user to interactively run differential expression analyses in the Meshinchi lab shiny application.

User

The user would want to select a cohort of interest and then select a variable which we can binarize to split that cohort and compare the transcriptomes of.

E.g., A user wants to identify the genes which are differentially expressed in the RUNX1-RUNX1T1 subtype compared to all other pediatric AML. We need a manner for them to select the pAML cohort (TARGET 1031, SWOG, etc) and to select the variable which they will split the cohorts on (Primary Fusion, Age, Gender, Mutations, etc).

Outputs

  1. A matrix displaying the analysis results.

log2 fold change (MLE): condition treated vs untreated Wald test p-value: condition treated vs untreated

           baseMean log2FoldChange     lfcSE      stat    pvalue      padj

FBgn0000008 95.28865 -0.0390130 0.218997 -0.178144 0.8586100 0.947833 FBgn0000017 4359.09632 -0.2548984 0.113535 -2.245099 0.0247617 0.131475 FBgn0000018 419.06811 -0.0625571 0.129956 -0.481372 0.6302523 0.852180 FBgn0000024 6.41105 0.3097331 0.750231 0.412850 0.6797164 0.877741 FBgn0000032 990.79225 -0.0465134 0.120215 -0.386918 0.6988171 0.886082 FBgn0000037 14.11443 0.4541562 0.523436 0.867644 0.3855893 0.691941

  1. Visualizations

A volcano plot would be one visualization tool which we can use to show outlier genes.

image

Gene expression plots showing the differences between the conditions.

image

  1. Pathway Enrichment Analysis

Practical Thoughts

As mentioned in the above, we need to make sure that something this potentially data intensive can be parsed by the application without bogging down and especially without blocking other users from accessing the application simultaneously. Dan T. has provided us some resources for making sure we can do this.