alanocallaghan / scater

Clone of the Bioconductor repository for the scater package.
https://bioconductor.org/packages/devel/bioc/html/scater.html
94 stars 40 forks source link

Support box plot in addition to violin plot #208

Closed TuomasBorman closed 4 months ago

TuomasBorman commented 4 months ago

Related to this issue: https://github.com/alanocallaghan/scater/issues/207

This PR adds support for box plot. In addition to violin plot, user can also choose to visualize the data with box plot, This can be done by specifying layout = "box. I tried to follow your coding style, and this should be minimum modifications to get the support.

Here are examples on functionality

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

#
colData(example_sce) <- cbind(colData(example_sce), perCellQCMetrics(example_sce))

plots <- list()
plots[[1]] <- plotColData(example_sce, y = "Treatment", x = "sum", colour_by = "Mutation_Status", layout = "box")
plots[[2]] <- plotColData(example_sce, y = "Treatment", x = "sum", colour_by = "Mutation_Status")
plots[[3]] <- plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by = "Mutation_Status", layout = "test")
plots[[4]] <- plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by = "Mutation_Status", layout = "violin")
#
plots[[5]] <- plotExpression(example_sce, rownames(example_sce)[1:5])
plots[[6]] <- plotExpression(example_sce, c("Gene_0001", "Gene_0004"), x="Mutation_Status", layout = "violin")
plots[[7]] <- plotExpression(example_sce, rownames(example_sce)[1:5], layout = "box", point_alpha = 0.1, show_se = TRUE)
plots[[8]] <- plotExpression(example_sce, c("Gene_0001", "Gene_0004"), x="Mutation_Status", layout = "box", show_smooth = TRUE)
#
rowData(example_sce) <- cbind(rowData(example_sce), perFeatureQCMetrics(example_sce))

plots[[9]] <- plotRowData(example_sce, y="mean", show_median = TRUE)
plots[[10]] <- plotRowData(example_sce, y="mean", layout = "box", show_violin = TRUE)

library(patchwork)

wrap_plots(plots)

image

-Tuomas

TuomasBorman commented 4 months ago
sce <- example_sce[1, ]
plots <- list()
plots[[1]] <- plotColData(sce, y = "detected", x = "Cell_Cycle", colour_by = "Mutation_Status", layout = "box")
plots[[2]] <- plotColData(sce, y = "detected", x = "Cell_Cycle", colour_by = "Mutation_Status", layout = "box", point_shape = NA)

wrap_plots(plots)

image

alanocallaghan commented 4 months ago

Thanks, looks good!

I would prefer to add a second boolean show_boxplot, because overlaying a boxplot on a violin plot can be useful (it gives you the median and quartiles, at least). Also, it fits with show_violin, whereas otherwise there's some redundancy between show_violin and layout.

This would mean making the boxplots narrower at least in the case when show_violin is TRUE along with show_boxplot, I think a width of 0.25 is a nice magic number I've used in the past but it may be worth experimenting a bit on example data.

alanocallaghan commented 4 months ago

An alternative would be to have a vector-valued argument, geoms that supports violin and boxplot for now but potentially others in future. However this would probably mean supporting and deprecating the existing version for a release cycle which might make for messier code

TuomasBorman commented 4 months ago

show_boxplot seems to be better solution. Now the width of boxplot is 0.25 when violin plot is plotted, and I think it looks nice. I would keep the default width when violin is not plotted; it shows more clearly which points belongs to which group.

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

#
colData(example_sce) <- cbind(colData(example_sce), perCellQCMetrics(example_sce))

plots <- list()
plots[[1]] <- plotColData(example_sce, y = "Treatment", x = "sum", colour_by = "Mutation_Status", show_boxplot = TRUE)
plots[[2]] <- plotColData(example_sce, y = "Treatment", x = "sum", colour_by = "Mutation_Status")
plots[[3]] <- plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by = "Mutation_Status", show_boxplot = TRUE)
plots[[4]] <- plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by = "Mutation_Status", show_violin = FALSE)
#
example_sce <- example_sce[1:20, 1:20]
plots[[5]] <- plotExpression(example_sce, rownames(example_sce)[1:5])
plots[[6]] <- plotExpression(example_sce, c("Gene_0001", "Gene_0004"), x="Mutation_Status", show_violin = FALSE)

plots[[7]] <- plotExpression(example_sce, rownames(example_sce)[1:5], show_boxplot = TRUE, point_alpha = 0.1, show_se = TRUE)
plots[[8]] <- plotExpression(example_sce, c("Gene_0001", "Gene_0004"), x="Mutation_Status", show_boxplot = TRUE, show_smooth = TRUE, show_violin = FALSE)
#
rowData(example_sce) <- cbind(rowData(example_sce), perFeatureQCMetrics(example_sce))

plots[[9]] <- plotRowData(example_sce, y="mean", show_median = TRUE)
plots[[10]] <- plotRowData(example_sce, y="mean", show_boxplot = TRUE, show_violin = TRUE)

library(patchwork)

plots[[11]] <- plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by = "Mutation_Status", show_boxplot = TRUE)
plots[[12]] <- plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by = "Mutation_Status", show_boxplot = TRUE, point_shape = NA, show_violin = FALSE)

wrap_plots(plots)

image

This figure below shows the boxplots when the width is always 0.25

image

TuomasBorman commented 4 months ago

An alternative would be to have a vector-valued argument, geoms that supports violin and boxplot for now but potentially others in future. However this would probably mean supporting and deprecating the existing version for a release cycle which might make for messier code

That could be nice. It would make the code simpler when there are multiple choices. For these two (violin and box), I prefer the current solution as it is quite simple

TuomasBorman commented 4 months ago

Is there way to disable coloring (I did not find with quick search, I'm in hurry currently.)? User might want to create just basic box plot without any colors.

image

alanocallaghan commented 4 months ago

btw I hope it goes without saying but you don't need to add your email when adding yourself as a ctb, it's entirely up to you

TuomasBorman commented 4 months ago

btw I hope it goes without saying but you don't need to add your email when adding yourself as a ctb, it's entirely up to you

Ahh yes. I did not pay attention on that, I just copy-pasted my information. I would rather remove my email, but I can do that if I do another PR on the coloring

alanocallaghan commented 4 months ago

Removed 35f30023e222e5b3c3fba73222095b06efa9b556

TuomasBorman commented 4 months ago

Looks great, thanks! Can address the colour thing in a separate PR if necessary

There was already option for disabling colors, awesome

plotExpression(example_sce, rownames(example_sce)[1:5], show_boxplot= TRUE, feature_colors = FALSE, show_violin = FALSE)

image