ACCLAB / dabestr

Data Analysis with Bootstrap Estimation in R
https://acclab.github.io/dabestr
Apache License 2.0
214 stars 34 forks source link

Adding error bar to points #50

Open JonatanFernandez opened 5 years ago

JonatanFernandez commented 5 years ago

Hello!

I've been trying to add error bars to the individual points (for technical replicates in an experiment). It seems the plot function generates a ggplot2 object but it's not mantaining the original variable names maybe?

require(dabestr)
require(dplyr)
require(ggplot2)

GroupCtrl = rnorm(n = 5, mean = 5000000, sd = 500000)
GroupTest = rnorm(n = 5, mean = 4500000, sd = 450000)

SimulatedData = data.frame(Group = factor(c(rep(0,15),rep(1,15))),
                      Replicate = factor(rep(rep(c(1,2,3,4,5), each = 3),2)),
                      PeakArea = c(sapply(GroupCtrl, function(x){x + rnorm(n = 1, mean = 0, sd = 35000)}),
                                     sapply(GroupCtrl, function(x){x + rnorm(n = 1, mean = 0, sd = 45000)}),
                                     sapply(GroupCtrl, function(x){x + rnorm(n = 1, mean = 0, sd = 15000)}),
                                     sapply(GroupTest, function(x){x + rnorm(n = 1, mean = 0, sd = 25000)}),
                                     sapply(GroupTest, function(x){x + rnorm(n = 1, mean = 0, sd = 20000)}),
                                     sapply(GroupTest, function(x){x + rnorm(n = 1, mean = 0, sd = 30000)})))
levels(SimulatedData$Group) = c('Control', 'Treatment')

SEM = function(x){return(sd(x)/sqrt(length(x)))}

GroupedData = SimulatedData %>% group_by(.dots = c('Group','Replicate')) %>% summarize(AvrgPA = mean(PeakArea),SEM = SEM(PeakArea))
unpaired_mean_diff <- dabest(GroupedData, Group, AvrgPA,
                             idx = c("Control", "Treatment"),
                             paired = FALSE) 

p = plot(unpaired_mean_diff) 
p

So far so good, but if I try to add error bars..

p + geom_pointrange(aes(ymin=AvrgPA-SEM, ymax=AvrgPA+SEM))

Error in FUN(X[[i]], ...) : object 'AvrgPA' not found

Any workaround this?

Thank you so much!

josesho commented 5 years ago

Hi! You've got two issues here: technical replicates, and handling the output of plot.dabestr().

1. Technical replicates

dabestr does not currently have the ability to handle multilevel models, which in your case represents technical replicates of the same sample.

Using your example code above, with

set.seed(12345)

for reproducibility, you get the following Gardner-Altman plot: Rplot-issue50

We've deliberately designed the Gardner-Altman plot to not display the individual groups' mean±SD, because the presence of multiple (potentially overlapping) horizontal and vertical lines is aesthetically jarring, and likely to be distracting.

One way around it is to use

plot(unpaired_mean_diff, float.contrast=FALSE)

to obtain a Cumming plot instead.

Rplot-issue50-2

where the means±SD are plotted alongside each group in the top panel. (The means are depicted as gaps in the vertical error-bar-style lines; the ends of these lines represent the upper and lower SDs.)

BUT, as noted, these are TECHNICAL REPLICATES, so the effect size is actually not correct as they are not independent samples from the same population. I'm not sure what your experimental design is, but I would suggest obtaining multiple samples (with Ns to give you sufficient power) and computing the effect size.

PS If you have multilevel modelling code for bootstrap effect sizes sitting around, please feel free to do a pull request and add it to dabestr!

2. Output of plot.dabestr()

The output of plot.dabestr() might look like a ggplot2 object, but we've used cowplot to stitch together two separate plots. I'm not sure if one can replot in the ggplot style of adding to the grob with a cowplot object.... You might have more luck on that package's Github issue tracker!

Hope this helped!

JonatanFernandez commented 5 years ago

Hello Joses!

Thank you for your detailed answer. I am aware of the multilevel design issue, but in biology most of the time the technical replicates are reduced to 3: they are mainly interpreted as a quality metric and they are too few to perform bootstrap on them. I didn't want to overcomplicate the analysis, and I will just try to plot the average as a representative measure of all the technical replicates together with the SEM of each biological sample (group of replicates) to put in context the small relative error of the technical process in comparison to the biological variance.

I'll try to play a little bit with the code and see what I can make.

Thanks.

JonatanFernandez commented 5 years ago

Hello Again,

I made some progress (see below) but I am a bit stuck trying to find the variable add the jitter to the geom_pointrange.

image

Regarding the presence of multiple vertical lines and the jarring / swamp effect, this is just a proof of concept with simulated data: In the kind of data I am planning to plot the error bars should be much smaller and the sample size is still relatively small, so it should be ok.

Thanks!

JonatanFernandez commented 5 years ago

Ok, I figured out how to make a proof of concept..

image

Basically I modified the function plot.dabest :

if (rawplot.type == 'swarmplot') {
    rawdata.plot <-
      rawdata.plot +
      do.call(ggbeeswarm::geom_quasirandom, swarmplot.params)

I commented _rawdata.plot <- rawdata.plot + do.call(ggbeeswarm::geomquasirandom, swarmplot.params)

and substituted it for

  rawdata.plot <-
        rawdata.plot +
        ggplot2::geom_pointrange(alpha = 0.75, size = 0.5,
                               aes(!!x_enquo, !!y_enquo, color = Group,
                                   ymin= AvrgPA - SEM,
                                   ymax= AvrgPA + SEM),
                               position = position_dodge2(width = 1.25, padding = 0))

It took me a while to fix the dodge. I am not very proficient with R. It would be really cool if you could add a more consistent version of this to the package (as an alternative to the swarmplot and sineplot, making the function request for a sd/sem column, maybe?).

Thanks again!

josesho commented 5 years ago

This looks interesting.... Could you start a pull request, and we can see how to finesse this? (You will be listed as a contributor if you start a pull request, which would be more favourable than if I copy-pasted your code.)

JonatanFernandez commented 5 years ago

Ok, pull request opened! I'll be taking a look on the code to see how do you want to organize the function, and I'll try to contribute further.

I've noticed that the webpage plotting tool actually have some more features in terms of size effects? Aren´t those metrics implemented yet in the R package?

Best regards,

josesho commented 5 years ago

Thanks for the PR!

Re: standardized effect sizes—these are in the pipeline (see #31 and #32). The webapp uses the Python DABEST package as the backend; this has the standardized effect sizes implemented already.