AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
99 stars 66 forks source link

Update broad histology and cancer group palettes, etc. to reflect v22 release #1401

Closed jaclyn-taroni closed 2 years ago

jaclyn-taroni commented 2 years ago

Purpose/implementation Section

We need to accommodate additional cancer groups PAST, PXA, and SEGA (#1368). Here's what I came up with:

000007

⚠️ needs to get updated to use the v22 release

sjspielman commented 2 years ago

I think this is about as good as we're going to get for separating out the colors!

jaclyn-taroni commented 2 years ago

I've addressed #1403 here now as well.

jharenza commented 2 years ago

Note, as of 294b408, this passes CI since #1452 is stacked on this and passes.

jaclyn-taroni commented 2 years ago

So it looks like maybe we missed these LGG samples previously (maybe because of N when forming the original cancer group hex codes) and they should now go into Other low-grade gliomas so we have a complete group for displays. Likewise, the Embryonal tumor with multilayer rosettes above would go into CNS Embryonal tumor cancer group display. I think you can use broad_histology_display to work in this logic. Can you update cancer_group_display when broad_histology_display == HGG, LGG, or embryonal to move those samples into the general display groups?

We can do this, but functionally I think this means that cancer_group_display will be somewhere between broad_histology and cancer_group in terms of specificity, instead of a representation of cancer_group where any cancer group that doesn't meet the sample size threshold is set to Other (as originally intended).

We're probably going to need a table in figures/README that explains the mapping:

cancer_group_display Included cancer groups
jaclyn-taroni commented 2 years ago

I'm working on this now, using broad_histology values to guide the collapsing into Other categories. Oligodendroglioma might complicate our plans a bit, as there are cases where samples with Oligodendroglioma in cancer_group would end up split into Other low-grade gliomas and Other high-grade gliomas.

I'll push what I have shortly so people can see for themselves.

jaclyn-taroni commented 2 years ago

We're going to need a different approach for the oncoprint palettes as well. I'm pushing what I have with that commented out.

jashapiro commented 2 years ago

Oligodendroglioma might complicate our plans a bit, as there are cases where samples with Oligodendroglioma in cancer_group would end up split into Other low-grade gliomas and Other high-grade gliomas.

This seems to me like it might be the correct result, at least for the purposes of many of the display figures where we are distinguishing HGG and LGG. The limitation is that we will need to update any joining between the histology file and palette file to use both broad histology and cancer group. But we may need to do that anyway if we are going to make the "Other LGG" and "Other HGG" labels follow expectations in the case where we do not separately label all possible cancer groups within LGG/HGG.

jharenza commented 2 years ago

Agree with above, this is expected and we have been joining on both in many of the module updates.

@jaclyn-taroni what do you mean by updates for oncoprint? I was thinking we'd only be adding distinguishing colors for the LGAT tumor plot, but maybe you had something else in mind?

jaclyn-taroni commented 2 years ago

Agree with above, this is expected and we have been joining on both in many of the module updates.

If that is the case, that assuages my concern that this is a bigger bite than I was hoping for. But I think it's still a bigger bite than I was hoping for because...

what do you mean by updates for oncoprint? I was thinking we'd only be adding distinguishing colors for the LGAT tumor plot, but maybe you had something else in mind?

There will need to be a rewrite of how we handle cancer group colors for oncoprints. Now we're collapsing multiple cancer groups into, e.g., "Other low-grade gliomas." Previously, each cancer group under the "Other low-grade gliomas" umbrella would have gotten a random color from a greys palette in the oncoprint. Should we still be showing the individual cancer groups in the oncoprint? My assumption is yes = we need to rewrite how the oncoprint display palette is generated. And then what do we do about the "Low-grade glioma astrocytoma" --> "Other low-grade gliomas" in the oncoprint context? Maybe that's still fine to do.

jharenza commented 2 years ago

Should we still be showing the individual cancer groups in the oncoprint? My assumption is yes = we need to rewrite how the oncoprint display palette is generated. And then what do we do about the "Low-grade glioma astrocytoma" --> "Other low-grade gliomas" in the oncoprint context? Maybe that's still fine to do.

I think that we might do something like: instead of the old HGG label, it'll now be "other HGG" and for any groups not colored in the LGG plot (non SEGA, pilocytic, pxa) and which were previously in the general LGG label, that would become "other LGGs". That seems easier than greys for every group, esp if we have a very small N in those groups?

jaclyn-taroni commented 2 years ago

I think that we might do something like: instead of the old HGG label, it'll now be "other HGG" and for any groups not colored in the LGG plot (non SEGA, pilocytic, pxa) and which were previously in the general LGG label, that would become "other LGGs". That seems easier than greys for every group, esp if we have a very small N in those groups?

Okay, to clarify – for the LGAT oncoprint, we'd expect the following groupings:

oncoprint_display (same as cancer_group_display currently) cancer_group
Other low-grade gliomas Low-grade glioma astrocytoma
Gliomatosis cerebri
Diffuse fibrillary astrocytoma
Oligodendroglioma
Subependymal Giant Cell Astrocytoma Subependymal Giant Cell Astrocytoma
Pilocytic astrocytoma Pilocytic astrocytoma
Ganglioglioma Ganglioglioma
Pleomorphic xanthoastrocytoma Pleomorphic xanthoastrocytoma

Is that correct?

jharenza commented 2 years ago

Looks right!

jaclyn-taroni commented 2 years ago

Okay, functionally, I believe we no longer need an oncoprint-specific palette at this point. I will take it out in the interest of moving things along, but I imagine we may need to revisit when we revise the oncoprints to reflect the v22 release.