AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
101 stars 67 forks source link

Compile pathology_free_text_diagnosis subtypes #1084

Closed kgaonkar6 closed 3 years ago

kgaonkar6 commented 3 years ago

Purpose/implementation Section

What scientific question is your analysis addressing?

Adding back the individual pathology_free_text_diagnosis subtypes to the compiled_molecular_subtypes.tsv

What was your approach?

There are 2 kinds of updates: The samples in the files below have molecular-subtyping results which are already part of the compile file analyses/molecular-subtyping-pathology/results/compiled_molecular_subtypes.tsv so we will be updating the values for these samples using outputs from the pathology-free-text-diagnosis terms. File Logic to include as results Previous subtype Description ticket
analyses/molecular-subtyping-pathology/results/lgat-pathology-free-text-subtypes.tsv #1060 Update existing values LGAT #1000
analyses/molecular-subtyping-pathology/results/cranio_adam_subtypes.tsv #823 Update existing values CRANIO #994
analyses/molecular-subtyping-pathology/results/glialneuronal_tumor_subtypes.tsv Update existing values LGAT #996
The following subtypes can be directly added to analyses/molecular-subtyping-pathology/results/compiled_molecular_subtypes.tsv as well since these subtypes don't have molecular-subtyping results File Logic to include as results Description ticket
analyses/molecular-subtyping-pathology/results/cns-lymphoma-subtypes.tsv Append #1057
analyses/molecular-subtyping-pathology/results/juvenile-xanthogranuloma-subtypes.tsv Append #1056
analyses/molecular-subtyping-pathology/results/choroid_plexus_papilloma_subtypes.tsv Append #1065
analyses/molecular-subtyping-pathology/results/meningioma_subtypes.tsv Append #1013

What GitHub issue does your pull request address?

1061

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

tables

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

kgaonkar6 commented 3 years ago

The following issues were fixed:

jharenza commented 3 years ago

Could you also pop #1108 into this PR since it is in the same module?

https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/fad3359c16e7dbcff5169d08f854769c55f57821/analyses/molecular-subtyping-pathology/01-compile-subtyping-results.Rmd#L283-L285

should be

  mutate(integrated_diagnosis = case_when(molecular_subtype == "CRANIO, ADAM" ~"Adamantinomatous craniopharyngioma",
                                          molecular_subtype == "CRANIO, PAP" ~"Papillary craniopharyngioma", 
                                          TRUE ~ NA_character_),
jharenza commented 3 years ago

I also wanted to check that you plan to update molecular-subtype-integrate after this PR with updated harmonized_diagnosis since I do not see those filled in here... in which case, I think we may need a ticket for that one?

kgaonkar6 commented 3 years ago

In the latest commit, I've updated the typo Adamantinomatous craniopharyngioma and added back meningioma.

About molecular-subtype-integrate didn't we decide that we will only run the module per data release so that the pbta-histologies.tsv file matches? Maybe I mis-understood, a ticket will be good.

jharenza commented 3 years ago

That's possible, but I think the way it is set up now is via an excel sheet matching, right, and there needs to be some logic added for updating the harm dx based on free text dx, rather than just pulling over path dx, right?

kgaonkar6 commented 3 years ago

That's possible, but I think the way it is set up now is via an excel sheet matching, right, and there needs to be some logic added for updating the harm dx based on free text dx, rather than just pulling over path dx, right?

Hmm I don't think the logic needs to be updated in the integrate step.

Any BS_ID already subtyped ( ie. exists in compiled_molecular_subtypes_with_clinical_pathology_feedback_and_report_info.tsv) is not going to be part of the code that pulls harm_dx/broad_hist/short_hist from the excel sheet. That part is only for the non-subtyped + pathology_diagnosis=="Other" BS_IDs. We could do have molecular-subtyping-integrate rerun PR to see if that's true or I could have missed some key logic.

jharenza commented 3 years ago

gotcha! that makes sense.