Proposed Analysis: generate an Excel spreadsheet for PedOT website table display orders and names

What are the goals of the analysis?

The analysis generates an Excel spreadsheet for PedOT website table display orders and names, according to the Slack discussions at https://opentargetspediatrics.slack.com/archives/C021Z53SK98/p1628181091109300.

The Excel spreadsheet will be used to coordinate table column order and name changes on PedOT website table views, and the coordination plan is summarized in the following diagram.

pedot_table_display_column_order_name_coordination_Aug5_2021_v1 2

The main consideration for this coordination plan is that the Excel spreadsheet allows clinicians to advise column display orders and names without any programming barrier, as suggested by @jharenza . Another consideration is that front-end GraphQL requires variable names to have no space or / (http://spec.graphql.org/June2018/#sec-Names), but such characters may be used in the display column names, so the display names may have to be implemented in the front-end. The Excel spreadsheet probably could also be converted to JSON objects by front-end developers using Python pandas.read_excel to reduce manual conversions. Additionally, multiple validation sub-tasks are suggested by @jonkiky to ensure that the front-end column configuration file aligns with the back-end database.

What methods do you plan to use to accomplish the scientific goals?

Generate the Excel spreadsheet using the TSV files in the following modules.

cnv-frequencies
fusion-frequencies
rna-seq-expression-summary-stats
snv-frequencies

Each sheet in the Excel spreadsheet has name as the filename of the following JSONL files.

variant-level-snv-consensus-annotated-mut-freq.jsonl.gz
gene-level-snv-consensus-annotated-mut-freq.jsonl.gz
putative-oncogene-fusion-freq.jsonl.gz
putative-oncogene-fused-gene-freq.jsonl.gz
long_n_tpm_mean_sd_quantile_gene_wise_zscore.jsonl.gz
long_n_tpm_mean_sd_quantile_group_wise_zscore.jsonl.gz
gene-level-cnv-consensus-annotated-mut-freq.jsonl.gz

Each sheet contains the following rows.

Column names in the JSONL/TSV files.
Column names for PedOT table view display.
10 sample rows of table values.

What input data are required for this analysis?

variant-level-snv-consensus-annotated-mut-freq.tsv.gz
gene-level-snv-consensus-annotated-mut-freq.tsv
putative-oncogene-fusion-freq.tsv.gz
putative-oncogene-fused-gene-freq.tsv.gz
long_n_tpm_mean_sd_quantile_gene_wise_zscore.tsv.gz
long_n_tpm_mean_sd_quantile_group_wise_zscore.tsv.gz
gene-level-cnv-consensus-annotated-mut-freq.tsv.gz

How long do you expect is needed to complete the analysis? Will it be a multi-step analysis?

2-3 days

Who will complete the analysis (please add a GitHub handle here if relevant)?

@logstar

What relevant scientific literature relates to this analysis?

cc @jharenza @jonkiky

d3b-center / ticket-tracker-OPC