README added for molecular subtype integrate module

runjin326 commented 2 years ago

Purpose/implementation Section

What scientific question is your analysis addressing?

Modified README to include explanation for cancer group assignment.

What was your approach?

First showed which cancer groups would just use harmonized_diagnosis. Then showed which cancer groups would be harmonized_diagnosis minus the subtype information. Lastly, showed a table indicating additional modifications to the cancer groups for formatting/simplification reasons.

What GitHub issue does your pull request address?

https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/1246

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Please check to see whether the description is clear enough. We could also directly show a table containing how each harmonized_diagnosis is mapped to cancer_group.

Is there anything that you want to discuss further?

See above.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes.

Results

What types of results are included (e.g., table, figure)?

README file with description and tables.

What is your summary of the results?

NA

Reproducibility Checklist

[ ] The dependencies required to run the code in this pull request have been added to the project Dockerfile.
[ ] This analysis has been added to continuous integration.

Documentation Checklist

[ ] This analysis module has a README and it is up to date.
[ ] This analysis is recorded in the table in analyses/README.md and the entry is up to date.
[ ] The analytical code is documented and contains comments.

sjspielman commented 2 years ago

Thanks for this @runjin326 ! Going through it now.

I want to note here that this module's notebook itself seems to be missing some important text - line 29 ends in a sentence fragment.

If samples are not processed by a molecular-subtyping-* module then the br

This needs to be fleshed out to enable clear, unambiguous descriptions of the groupings. CC @jharenza @kgaonkar6 , maybe we need a separate issue/PR for this?

sjspielman commented 2 years ago

Hi @runjin326 , I'm seeing some differences between your lists of cancer groups in the updated README and the list I get when I derive the groupings from the code. Something that will help to determine which of our lists is correct is to see the code that generates these diagnosis lists.

This code could be probably be integrated at the end of 01-integrate-subtyping.Rmd so there is a clear record of how information in the README was produced. If you can add a new subsection to this notebook that provides the lists to integrate into README programmatically, then we can be really confident we've got the right diagnoses! When you do this, it would be great if you could add a sessionInfo() chunk to the bottom too. Thanks!

runjin326 commented 2 years ago

@sjspielman - thanks so much for the review. I have now added a separate Rmd showing how the tables in README are generated. (I feel like a separate Rmd is clearer than additional couple lines at the end of 01 script)

I also modified the language in the README a little bit to make it more clear how those lists/tables were generated. Let me know if they all make sense.

sjspielman commented 2 years ago

Hi @runjin326 , adding that separate file is probably the right way to go here. However, I have to say, I am really confused by what this file is and isn't doing. I'm going to loop @jharenza in because I think I must be just missing something here. Possibly what's missing is there is no clear definition of the procedure for cancer groupings in the first place so it's challenging to assess how these files bring that information out. I might suggest bringing in someone with more history with this module to add in some clearer explanations in general?

Either way, my sense was - the goal is here is to precisely show how the cancer groups were derived so it is clear how more granular diagnoses relate to the final cancer groupings. So, I would expect this file to clearly show me which samples or other groupings are coalescing into cancer groups, of which there are 59 (and an NA) in the histologies file. If this is not the goal, I have misunderstood.

jharenza commented 2 years ago

Hi @sjspielman. @runjin326 and I just chatted offline about this and she is going to make some changes because I was also a little confused :). The plan is to create a final table which has all harmonized diagnoses mapped to their new cancer groups and to describe in text what was done to get there. In brief, it was mostly removal of the molecular subtype (if exists) and then collapsing into manual groups which I chose based on what made the most sense for plotting/analysis. There were a few aesthetic/nomenclature changes, which were removals of / and ; and all abbreviations in parentheses to collapse groups/make plotting easier, so we will add that into a paragraph to describe.

runjin326 commented 2 years ago

@jharenza and @sjspielman, I now modified the README to only describe the steps and included one table indicating what additional modifications are. Also wrote out 2 files for harmonized diagnosis and cancer group mapping. Let me know whether this is more clear now.

jaclyn-taroni commented 2 years ago

I filed a PR to this branch: https://github.com/runjin326/OpenPBTA-analysis/pull/1 because I thought having some code to talk about would be helpful when I describe some ideas I had. More info over there, but to summarize my concerns:

We probably want to create tables from the 01 output rather than what's in the data release. There are at least two ways to accomplish that.
I think it would be helpful to have all of the info in has_subtype_removed and additional_modi in a table that anyone can inspect.
It'd be great to save a table specifically for what needs to go into the README. I think that's what I accomplished.

sjspielman commented 2 years ago

Noting that in interest of time I made the requested changes. This is now ready for merge.

AlexsLemonade / OpenPBTA-analysis