PediatricOpenTargets / OpenPedCan-api

2 stars 7 forks source link

Add DESeq differential gene expression table and plot Endpoints #37

Closed afarrel closed 1 year ago

afarrel commented 2 years ago

Description of Issue/Task

Add bulk tissue differential gene expression table and plot endpoints. Convert/Add DESeq tables to database Convert scripts to integrate with API Test and and add variables/options for end user to customize plot.

What Data module is this related to?

DESeq data module ran on CAVATICA and complex heatmap/barplot displaying fold change and TPM values

What release version of the data will be used?

Data from release v9

What input or requirements are needed from other modules/tickets?

  1. Final DESeq table (RDS file) from DESeq module
  2. R script to generate plots

When do you expect this to be completed?

3 weeks

Who will complete the updated analysis?

@afarrel

logstar commented 2 years ago

@afarrel MTP pages use EFO ID in URL to point to specific diseases. For example, https://ppdc-otp-dev.bento-tools.org/evidence/ENSG00000171094/EFO_0000621. MTP will send API requests with EFO ID as parameter, and current MTP will not be able to provide OpenPedCan-analysis cancer_group as parameter.

For heatmap endpoint design, how to generate differential gene expression (DGE) heatmaps with genes as rows for EFO IDs that are mapped to multiple diseases? For example, EFO_0000621 is mapped to CNS neuroblastoma and Neuroblastoma diseases, which are cancer_groups in OpenPedCan-analysis v9.

In database building #59, I am planning to rank genes for each cancer_group. When generating DGE heatmaps with genes as rows, a table of top genes of one or more cancer_groups will be returned from database. When this table has two or more cancer_groups, one heatmap for each cancer_group may need to be generated.

cc @taylordm @chinwallaa

logstar commented 2 years ago

@afarrel @taylordm @chinwallaa I have added PNG plot and JSON table endpoints for one-EFO differential gene expression (DGE) heatmaps, i.e., DGE heatmaps with top genes of one EFO ID as rows and all GTEx tissues as columns. The heatmap code is adapted from @afarrel's code. Let me know if you have any questions or suggestions.

I will be implementing PNG plot and JSON table endpoints for one-ENSG DGE heatmaps, i.e., DGE heatmaps with all diseases as rows and all GTEx tissues as columns.

The one-EFO DGE heatmap currently does not have boxplots on sides, in order to speed up the delivery of heatmap endpoints. The boxplots on sides will be implemented after all heatmap endpoints have been developed.

The row labels of the one-EFO heatmaps currently include disease/cancer_group, cohort, ENSG ID, and gene symbol, to uniquely identify a differential gene comparison with a GTEx tissue, due to the following data properties:

Following is a brief description of the implemented one-disease DGE heatmap endpoints.

The one-EFO endpoints are registered at https://github.com/PediatricOpenTargets/OpenPedCan-api/blob/60c03a4440a0b7684fcb960024cbc871fa717b46/src/plumber.R#L220-L259

logstar commented 2 years ago

@afarrel @chinwallaa @taylordm I have added PNG plot and JSON table endpoints for one-ENSG differential gene expression (DGE) heatmaps, i.e., DGE heatmaps with all diseases/cancer_groups as rows and all GTEx tissues as columns. The heatmap code is adapted from @afarrel's code. Let me know if you have any questions or suggestions.

I will be adding HTTP tests to all DGE endpoints.

The one-ENSG DGE heatmap currently also does not have boxplots on sides, in order to speed up the delivery of heatmap endpoints. The boxplots on sides will be implemented after all heatmap endpoints have been developed.

Following is a brief description of the implemented one-disease DGE heatmap endpoints.

The one-ENSG endpoints are registered at https://github.com/PediatricOpenTargets/OpenPedCan-api/blob/a93845a9dc601d6a25f15e16cc901589b447abcc/src/plumber.R#L259-L292

logstar commented 2 years ago

@afarrel @taylordm @chinwallaa I have added tests to differential gene expression (DGE) heatmap endpoints. All tests passed.

$ ./tests/run_tests.sh 
API base URL: http://localhost:8082

✔ |  OK F W S | Context
✔ | 354       | tests/r_test_scripts/test_endpoint_http.R [612.5 s]                                                                                                                                            

══ Results ══════════════════════
Duration: 612.6 s

Following is the endpoint response time boxplot of OpenPedCan-api. The new DGE heatmap endpoints generally can handle each request within 2.5 seconds.

endpoint_response_time_boxplot

I will be working on the following items:

I have also changed the endpoint tags and paths of DGE heatmaps to distinguish from TPM boxplot endpoints.