AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
100 stars 67 forks source link

Updated analysis: Oncoprints to use new minimal display palettes, remove cancer group with n < 10 #1177

Closed jaclyn-taroni closed 2 years ago

jaclyn-taroni commented 3 years ago

What analysis module should be updated and why?

oncoprint-landscape

What changes need to be made? Please provide enough detail for another participant to make the update.

With the changes coming in #1176, we will need to remove any cancer_group with n < 10 from the oncoprints for main display. I know there is logic in this module that is pretty tightly tied to broad_histology values, so there may be a non-trivial amount of work to get this done.

We also might consider creating individual oncoprint plots for cancer_group with n < 10.

Who will complete the updated analysis?

I expect I will complete the analysis and if not, I will need to update this with more specific instructions.

jharenza commented 3 years ago

I think I misunderstood when we talked about this - I was thinking we would have a separate set of colors for the oncoprint to ensure we are still plotting all cancer groups with mutations. This will also ensure that our plots match pedcbio for the genes listed so that people don't get confused if they check our figures vs pedcbio. Thoughts?

jaclyn-taroni commented 3 years ago

My take in https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/1174#issuecomment-915298938 was that we could include cancer_group-specific oncoprints for groups that didn't meet the sample size criteria:

I think I have to check on which cancer_group are >=10, but I think for all figures other than the oncoprint, this may be fine. For the oncoprint, we should still annotate as the specific cancer.

If the specific cancer does not meet this criteria, we could include the oncoprint in the supplemental material instead and possibly even split up by cancer_group, rather than broad_histology, which would allow us to avoid having to worry about colors for specific cancer groups with N < 10.

And more generally in https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/1174#issuecomment-915666312

And for cancer_group or broad_histology labels that don't make the cutoff based on sample size, we should devise ways to break those plots out individually as needed.

This what I meant by

We also might consider creating individual oncoprint plots for cancer_group with n < 10.

above, as well. We could then assemble the individual, cancer_group-specific plots into a multipanel supplemental figure.

I think that this will solve for this point:

This will also ensure that our plots match pedcbio for the genes listed so that people don't get confused if they check our figures vs pedcbio.

If we have a very clear legend about why certain cancer_group will not be represented and points to the relevant supplemental figure. If not, why not? I'm less familiar with pedcbioportal than you are.


I was thinking we would have a separate set of colors for the oncoprint to ensure we are still plotting all cancer groups with mutations.

To me, this seems more difficult to pull off than the 2 separate oncoprint figures. I think it is very useful to keep the color palette consistent across figures, which is the rationale for the minimal palette in #1176. It was difficult to get the 18 color palette we have for cancer_group, with the values related to the broad_histology by hue/saturation.

Functionally - If we add additional colors to the cancer_group palette for the oncoprint only, that could work and maybe become it's own column in the histology label palette (e.g., add oncoprint_hex to the table described in #1176).

I am not anxious to pick additional colors myself though; we are in diminishing returns territory in my opinion.

jharenza commented 3 years ago

hi @jaclyn-taroni - coming back to this - I have just put in a new issue for adding Ns per group for manuscript writing.

I am not sure we should be removing <10 from our oncoprints - I am not sure how many samples are not mutated and which would still enable visualization of the cancer groups in the oncoprint, but if we look at mutated only samples in the oncoprints currently, we would only have 2/3 groups displayed in A, 1/4 in B, 2/2 in C, and 3/16 in D. We really want to highlight here that we have many of those rare tumors that are scarce, and try to make the argument that we all need to put our data together to make a difference in these cancer types. Maybe we can chat about this a bit more?

jaclyn-taroni commented 2 years ago

I'm going to call this closed via #1200. We can open a new issue if needed!