Closed jaclyn-taroni closed 2 years ago
I think I misunderstood when we talked about this - I was thinking we would have a separate set of colors for the oncoprint to ensure we are still plotting all cancer groups with mutations. This will also ensure that our plots match pedcbio for the genes listed so that people don't get confused if they check our figures vs pedcbio. Thoughts?
My take in https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/1174#issuecomment-915298938 was that we could include cancer_group
-specific oncoprints for groups that didn't meet the sample size criteria:
I think I have to check on which cancer_group are >=10, but I think for all figures other than the oncoprint, this may be fine. For the oncoprint, we should still annotate as the specific cancer.
If the specific cancer does not meet this criteria, we could include the oncoprint in the supplemental material instead and possibly even split up by
cancer_group
, rather thanbroad_histology
, which would allow us to avoid having to worry about colors for specific cancer groups with N < 10.
And more generally in https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/1174#issuecomment-915666312
And for
cancer_group
orbroad_histology
labels that don't make the cutoff based on sample size, we should devise ways to break those plots out individually as needed.
This what I meant by
We also might consider creating individual oncoprint plots for
cancer_group
with n < 10.
above, as well. We could then assemble the individual, cancer_group
-specific plots into a multipanel supplemental figure.
I think that this will solve for this point:
This will also ensure that our plots match pedcbio for the genes listed so that people don't get confused if they check our figures vs pedcbio.
If we have a very clear legend about why certain cancer_group
will not be represented and points to the relevant supplemental figure. If not, why not? I'm less familiar with pedcbioportal than you are.
I was thinking we would have a separate set of colors for the oncoprint to ensure we are still plotting all cancer groups with mutations.
To me, this seems more difficult to pull off than the 2 separate oncoprint figures. I think it is very useful to keep the color palette consistent across figures, which is the rationale for the minimal palette in #1176. It was difficult to get the 18 color palette we have for cancer_group
, with the values related to the broad_histology
by hue/saturation.
Functionally - If we add additional colors to the cancer_group
palette for the oncoprint only, that could work and maybe become it's own column in the histology label palette (e.g., add oncoprint_hex
to the table described in #1176).
I am not anxious to pick additional colors myself though; we are in diminishing returns territory in my opinion.
hi @jaclyn-taroni - coming back to this - I have just put in a new issue for adding Ns per group for manuscript writing.
I am not sure we should be removing <10 from our oncoprints - I am not sure how many samples are not mutated and which would still enable visualization of the cancer groups in the oncoprint, but if we look at mutated only samples in the oncoprints currently, we would only have 2/3 groups displayed in A, 1/4 in B, 2/2 in C, and 3/16 in D. We really want to highlight here that we have many of those rare tumors that are scarce, and try to make the argument that we all need to put our data together to make a difference in these cancer types. Maybe we can chat about this a bit more?
I'm going to call this closed via #1200. We can open a new issue if needed!
What analysis module should be updated and why?
oncoprint-landscape
What changes need to be made? Please provide enough detail for another participant to make the update.
With the changes coming in #1176, we will need to remove any
cancer_group
with n < 10 from the oncoprints for main display. I know there is logic in this module that is pretty tightly tied tobroad_histology
values, so there may be a non-trivial amount of work to get this done.We also might consider creating individual oncoprint plots for
cancer_group
with n < 10.Who will complete the updated analysis?
I expect I will complete the analysis and if not, I will need to update this with more specific instructions.