Closed runjin326 closed 2 years ago
Thanks for this @runjin326 ! Going through it now.
I want to note here that this module's notebook itself seems to be missing some important text - line 29 ends in a sentence fragment.
If samples are not processed by a molecular-subtyping-* module then the br
This needs to be fleshed out to enable clear, unambiguous descriptions of the groupings. CC @jharenza @kgaonkar6 , maybe we need a separate issue/PR for this?
Hi @runjin326 , I'm seeing some differences between your lists of cancer groups in the updated README and the list I get when I derive the groupings from the code. Something that will help to determine which of our lists is correct is to see the code that generates these diagnosis lists.
This code could be probably be integrated at the end of 01-integrate-subtyping.Rmd
so there is a clear record of how information in the README was produced. If you can add a new subsection to this notebook that provides the lists to integrate into README programmatically, then we can be really confident we've got the right diagnoses!
When you do this, it would be great if you could add a sessionInfo()
chunk to the bottom too. Thanks!
@sjspielman - thanks so much for the review. I have now added a separate Rmd showing how the tables in README are generated. (I feel like a separate Rmd is clearer than additional couple lines at the end of 01 script)
I also modified the language in the README a little bit to make it more clear how those lists/tables were generated. Let me know if they all make sense.
Hi @runjin326 , adding that separate file is probably the right way to go here. However, I have to say, I am really confused by what this file is and isn't doing. I'm going to loop @jharenza in because I think I must be just missing something here. Possibly what's missing is there is no clear definition of the procedure for cancer groupings in the first place so it's challenging to assess how these files bring that information out. I might suggest bringing in someone with more history with this module to add in some clearer explanations in general?
Either way, my sense was - the goal is here is to precisely show how the cancer groups were derived so it is clear how more granular diagnoses relate to the final cancer groupings. So, I would expect this file to clearly show me which samples or other groupings are coalescing into cancer groups, of which there are 59 (and an NA
) in the histologies file. If this is not the goal, I have misunderstood.
Hi @sjspielman. @runjin326 and I just chatted offline about this and she is going to make some changes because I was also a little confused :). The plan is to create a final table which has all harmonized diagnoses mapped to their new cancer groups and to describe in text what was done to get there. In brief, it was mostly removal of the molecular subtype (if exists) and then collapsing into manual groups which I chose based on what made the most sense for plotting/analysis. There were a few aesthetic/nomenclature changes, which were removals of /
and ;
and all abbreviations in parentheses to collapse groups/make plotting easier, so we will add that into a paragraph to describe.
@jharenza and @sjspielman, I now modified the README to only describe the steps and included one table indicating what additional modifications are. Also wrote out 2 files for harmonized diagnosis and cancer group mapping. Let me know whether this is more clear now.
I filed a PR to this branch: https://github.com/runjin326/OpenPBTA-analysis/pull/1 because I thought having some code to talk about would be helpful when I describe some ideas I had. More info over there, but to summarize my concerns:
01
output rather than what's in the data release. There are at least two ways to accomplish that.has_subtype_removed
and additional_modi
in a table that anyone can inspect.Noting that in interest of time I made the requested changes. This is now ready for merge.
Purpose/implementation Section
What scientific question is your analysis addressing?
Modified README to include explanation for cancer group assignment.
What was your approach?
First showed which cancer groups would just use
harmonized_diagnosis
. Then showed which cancer groups would beharmonized_diagnosis
minus the subtype information. Lastly, showed a table indicating additional modifications to the cancer groups for formatting/simplification reasons.What GitHub issue does your pull request address?
https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/1246
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Please check to see whether the description is clear enough. We could also directly show a table containing how each
harmonized_diagnosis
is mapped tocancer_group
.Is there anything that you want to discuss further?
See above.
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes.
Results
What types of results are included (e.g., table, figure)?
README file with description and tables.
What is your summary of the results?
NA
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.