Figure 4E: Add hypermutator mut sigs heatmap

jharenza commented 2 years ago

Purpose/implementation Section

What scientific question is your analysis addressing?

Stemming from #1248, we decided a visualization + a table would be better than a table alone.

What was your approach?

This PR creates a heatmap of the mutational signatures for the patients who have tumors which are hypermutant or ultra-hypermutant.

What GitHub issue does your pull request address?

Stemmed from the analysis for #1248

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Annotation/legend labels may need to be cleaned up; I couldn't immediately figure out how to do this within the pheatmap function, so for now, they are being renamed within the dataframe.

Is there anything that you want to discuss further?

Do we like the color being used for the heatmap? I picked divergent_high_5 from here because the binary palette red was too bright for my liking and the green gradient palette wasn't super aesthetically pleasing no matter how many greens I used. The blue from binary is nice (see below for both versions) but may blend too much with the color for "Initial CNS Tumor", so what color should we use here? Since these go from zero to 1, I did not think coloring the 0 made much sense as it would for a set of values which go from [-X,0,+X].

hypermutator_sigs_heatmap_red.pdf hypermutator_sigs_heatmap_blue.pdf

How do we want to color patients? Do we want to list them in the legend? I tried using the divergent low palette, but I thought it would blend too much with the blue heatmap.
FYI, one sample is different from the pre-PR heatmap I showed and that was because I had taken a cell line instead of tumor for the plot. oops.
Maybe we should make "Mutator" --> "Mutation status"? @sjspielman

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

yes

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

[x] The dependencies required to run the code in this pull request have been added to the project Dockerfile.
[x] This analysis has been added to continuous integration.

Documentation Checklist

[x] This analysis module has a README and it is up to date.
[x] This analysis is recorded in the table in analyses/README.md and the entry is up to date.
[x] The analytical code is documented and contains comments.

jaclyn-taroni commented 2 years ago

I'm going to let @sjspielman review this, but feel free to mention me if you need me!

sjspielman commented 2 years ago

This looks good! I think this figure is a really concise way to communicate this information. A couple comments to start -

Is it possible to add a label for the color gradient legend for signature exposure?
- I see you tried to use gradient_col_palette.tsv to set this scale, but if it didn't look great then let's stick with this.
The Patient colors are not easily distinguished in colorblind situations (I checked with https://www.color-blindness.com/coblis-color-blindness-simulator/). Maybe we can use Okabe Ito here to be sure, or something from RColorBrewer
- Similarly, can we order the Patient legend based on the order the patients appear in the heatmap? I would guess this is a factor() thing
Is it possible to add a little more spacing between the legends and the heatmap itself? Doesn't need to be much, but it's a little bunched together now
This is failing CI almost certainly because of the CI subsetted data. We can get around this with a param to not run the heatmap if we're in CI. I did this in the mutational-signatures/07-plot_cns_fit.Rmd notebook for reference, to deal with an analogous situation

Maybe we should make "Mutator" --> "Mutation status"?

This is absolutely your call about which phrasing makes more sense!

so for now, they are being renamed within the dataframe

This is absolutely how I would do it too :)

Since this pub-ready figure is being added to the module, we also need to a line in figures/generate_figures.sh copying this final PDF figure into figures/pdfs/fig4/panels.

jharenza commented 2 years ago

Is it possible to add a label for the color gradient legend for signature exposure? I see you tried to use gradient_col_palette.tsv to set this scale, but if it didn't look great then let's stick with this.

I can try to look into this tomorrow

The Patient colors are not easily distinguished in colorblind situations (I checked with https://www.color-blindness.com/coblis-color-blindness-simulator/). Maybe we can use Okabe Ito here to be sure, or something from RColorBrewer

I updated these

Similarly, can we order the Patient legend based on the order the patients appear in the heatmap? I would guess this is a factor() thing

Will look into this tomorrow

Is it possible to add a little more spacing between the legends and the heatmap itself? Doesn't need to be much, but it's a little bunched together now

Yeah, I don't like this, either, but am not sure if this is possible - did some searching, but can try to search some more tomorrow

This is failing CI almost certainly because of the CI subsetted data. We can get around this with a param to not run the heatmap if we're in CI. I did this in the mutational-signatures/07-plot_cns_fit.Rmd notebook for reference, to deal with an analogous situation

thanks, updated this - let's see

sjspielman commented 2 years ago

The legend spacing indeed looks like it may not be possible. I wonder if it's possible to export them separately and then we can position them later with Illustrator? I know ComplexHeatmap can export separately but don't know enough about pheatmap.

jharenza commented 2 years ago

The legend spacing indeed looks like it may not be possible. I wonder if it's possible to export them separately and then we can position them later with Illustrator? I know ComplexHeatmap can export separately but don't know enough about pheatmap.

Yeah was thinking of a separate export. I don't know much about either, 😆, but I can check it out.

jharenza commented 2 years ago

Hi @sjspielman. I updated to complexheatmap, but couldn't easily figure out how to cut the rows between patients. There is a newer version of complexheatmp (2.5.2) which has pheatmap-like commands which could probably easily handle this, but I didn't want to install and update a lot of dependencies in case that messed up other modules. Also, it looks like the legend is centered, but would probably look better positioned at the top. If you are able to take a look, that'd be great.

sjspielman commented 2 years ago

@jharenza I gave this a go, and ended up with a new strategy (we can always revert!) where I've extract the legends as one horizontal row separately from the heatmap, and I made each of these the same width for stacking the heatmap on top of the legends in Illustrator. What do you think of this approach? It can also be modified to stack the legends vertically, but the packLegend() function is what allows us to space them out. Of course we could export them each separately, but that's a lot more for Illustrator.

jharenza commented 2 years ago

@jharenza I gave this a go, and ended up with a new strategy (we can always revert!) where I've extract the legends as one horizontal row separately from the heatmap, and I made each of these the same width for stacking the heatmap on top of the legends in Illustrator. What do you think of this approach? It can also be modified to stack the legends vertically, but the packLegend() function is what allows us to space them out. Of course we could export them each separately, but that's a lot more for Illustrator.

Thanks! I am ok with this method!

I still would like to create splits between patients so one can more easily look at initial/progressive tumors within patients. Do you think it is OK not having BS_ids in the plot? I suppose I can add these via legend or not mention (we don't in oncoprint). I moved annotations to the right so patients are the first to show up, making it a bit easier to visualize by patient. I will look into cutting the plot.

sjspielman commented 2 years ago

Do you think it is OK not having BS_ids in the plot?

I think it's fine either way, but I don't think we need both the BS_ids and the colors for them.

jharenza commented 2 years ago

woohooo! got the row split!

jharenza commented 2 years ago

@sjspielman I think this is ready for re-review. Wasn't sure if you wanted me to run the generate figures script or wait until review is completed or you add it in #1286

sjspielman commented 2 years ago

Wasn't sure if you wanted me to run the generate figures script

Definitely do not run this script. It's a behemoth in process of being updated.. We'll do a whole run of it on AWS when all the figure scripts are finalized since it will take a long time to run (possibly over a day with all the analyses that get re-run).

AlexsLemonade / OpenPBTA-analysis