d3b-center / ticket-tracker-OPC

A repo to generate and track tickets for ped OT
2 stars 0 forks source link

Updated analysis: generate TCGA differential gene expression table using `tumor-normal-differential-expression` #367

Closed logstar closed 1 year ago

logstar commented 2 years ago

What analysis module should be updated and why?

tumor-normal-differential-expression needs to be updated to generate a TCGA differential gene expression (DGE) table, because TCGA data will be included in OpenPedCan-analysis tables and plots for integration with MTP, after v11 release.

This issue is separated from updating tumor-normal-differential-expression with v11 data, to track each issue more precisely, as suggested by @jharenza.

What changes need to be made? Please provide enough detail for another participant to make the update.

Currently, DGE analysis only uses gene-expression-rsem-tpm-collapsed.rds to compare pediatric cancer_groups with GTEx tissue groups, which generates results/deseq_all_comparisons.jsonl and results/deseq_all_comparisons.tsv.

Add new workflows/commands/scripts to run the same DGE analysis on tcga-gene-expression-rsem-tpm-collapsed.rds. The new analysis should generate results/tcga_deseq_all_comparisons.jsonl and results/tcga_deseq_all_comparisons.tsv. Share all result files via OpenPedCan-analysis s3 bucket.

Note: When implementing the new DGE analysis for TCGA, consider interfaces for integrating batch effect removal procedure that is discussed in https://github.com/PediatricOpenTargets/OpenPedCan-analysis/pull/168.

@sangeetashukla - I was wondering if you could briefly describe the general steps for adding the new DGE analysis for TCGA.

@jharenza - I was wondering if any additional comparisons need to be added for TCGA data, e.g., TCGA cancer_group vs pediatric cancer_group.

What input data should be used? Which data were used in the version being updated?

tcga-gene-expression-rsem-tpm-collapsed.rds and input data for tumor-normal-differential-expression.

v10 or v11 data release can be used for developing the new analysis.

When do you expect the revised analysis will be completed?

@sangeetashukla - I was wondering if you could roughly estimate the time required for resolving this issue.

Who will complete the updated analysis?

cc @jharenza @afarrel @chinwallaa

sangeetashukla commented 2 years ago

@logstar I estimate less than two weeks to update the module and update the cavatica app. I have assigned this ticket to myself.