d3b-center / ticket-tracker-OPC

A repo to generate and track tickets for ped OT
2 stars 0 forks source link

Edit Somatic Alteration JSONL field names for consistency #252

Closed zdorman closed 2 years ago

zdorman commented 3 years ago

What data file(s) does this issue pertain to?

SNV by Gene: gene-level-snv-consensus-annotated-mut-freq.jsonl SNV by Variant: variant-level-snv-consensus-annotated-mut-freq.jsonl CNV by Gene: gene-level-cnv-consensus-annotated-mut-freq.jsonl Fusion by Gene: putative-oncogene-fused-gene-freq.jsonl Fusion: putative-oncogene-fusion-freq.jsonl

What release are you using?

v10

Put your question or report your issue here.

To improve consistency of the MTP ETL process, please consider the following changes to field names in the v11 Somatic Alteration (aka Somatic Mutation) JSONL files. A total of 15 fields (3 analogous fields in each of 5 files) would be affected. When v11 is released with the changes, FNL will update the relevant ids and configuration files.

The 3 fields in each file currently follow the general format of:

  1. Replace patients with subjects in all 5 JSONL fields containing patients
  2. Replace altered with mutated in 2 CNV by Gene JSONL fields for consistency
  3. Lower case of all characters except first character in all 15 JSONL fields. (We'd prefer lowerCamelCase, but most of the existing JSONL fields use the underscore format. If it makes sense in the future, we may ask to shift ALL JSONL fields to camelCase standard, but that's outside the scope of this issue)

Full table of changes:

Source File Current JSONL field name Proposed JSONL field name Changes in addition to case
SNV by Gene Total_mutations_Over_Patients_in_dataset Total_mutations_over_subjects_in_dataset patients -> subjects
SNV by Variant Total_mutations_Over_Patients_in_dataset Total_mutations_over_subjects_in_dataset patients -> subjects
CNV by Gene Total_alterations_over_Patients_in_dataset Total_alterations_over_subjects_in_dataset patients -> subjects
Fusion by Gene Total_alterations_Over_Patients_in_dataset Total_alterations_over_subjects_in_dataset patients -> subjects
Fusion Total_alterations_Over_Patients_in_dataset Total_alterations_over_subjects_in_dataset patients -> subjects
SNV by Gene Total_primary_tumors_mutated_Over_Primary_tumors_in_dataset Total_primary_tumors_mutated_over_primary_tumors_in_dataset -
SNV by Variant Total_primary_tumors_mutated_Over_Primary_tumors_in_dataset Total_primary_tumors_mutated_over_primary_tumors_in_dataset -
CNV by Gene Total_primary_tumors_altered_over_Primary_tumors_in_dataset Total_primary_tumors_mutated_over_primary_tumors_in_dataset altered -> mutated
Fusion by Gene Total_primary_tumors_mutated_Over_Primary_tumors_in_dataset Total_primary_tumors_mutated_over_primary_tumors_in_dataset -
Fusion Total_primary_tumors_mutated_Over_Primary_tumors_in_dataset Total_primary_tumors_mutated_over_primary_tumors_in_dataset -
SNV by Gene Total_relapse_tumors_mutated_Over_Relapse_tumors_in_dataset Total_relapse_tumors_mutated_over_relapse_tumors_in_dataset -
SNV by Variant Total_relapse_tumors_mutated_Over_Relapse_tumors_in_dataset Total_relapse_tumors_mutated_over_relapse_tumors_in_dataset -
CNV by Gene Total_relapse_tumors_altered_over_Relapse_tumors_in_dataset Total_relapse_tumors_mutated_over_relapse_tumors_in_dataset altered -> mutated
Fusion by Gene Total_relapse_tumors_mutated_Over_Relapse_tumors_in_dataset Total_relapse_tumors_mutated_over_relapse_tumors_in_dataset -
Fusion Total_relapse_tumors_mutated_Over_Relapse_tumors_in_dataset Total_relapse_tumors_mutated_over_relapse_tumors_in_dataset -

(I realize that some of the CNV by Gene inconsistencies addressed here ultimately stem from my request in #197. Thanks for your patience!)

logstar commented 3 years ago

@zdorman Thank you for creating this issue. Could you also create another issue at https://github.com/CBIIT/ppdc-config/issues to keep track of the required PedOT front-end configuration file changes? The configuration file changes will make PedOT front-end compatible with the proposed v11 JSONL file field names.

cc @jonkiky

ewafula commented 3 years ago

Thanks, @zdorman, I'll update the three modules for the v11 release.

runjin326 commented 2 years ago

Closing with PR149, PR155 and PR156 merged.