eclarke / ggbeeswarm

Column scatter / beeswarm-style plots in ggplot2
GNU General Public License v3.0
531 stars 31 forks source link

geom_quasirandom connect grouped points with geom_line #55

Open wipperman opened 3 years ago

wipperman commented 3 years ago

Hello,

I would like to connect related points that are separated within groups using geom_quasirandom, but am unsure how to do this. With geom_jitter this is done with the position_dodge argument (see here: https://stackoverflow.com/questions/39533456/how-to-jitter-both-geom-line-and-geom-point-by-the-same-magnitude), however, I am unable to figure this out for this package. I feel

Below is a reproducible example. I feel like this is close, however, the points and relevant lines are not in fact connected as they should be.

Any help or advice would be most appreciated. Thank you!!

library(ggplot2);library(ggbeeswarm);library(dplyr)

iris %>% 
  dplyr::mutate(flower = rep(1:nrow(iris), each = 3, len = nrow(iris))) %>% #make a variable to connect the dots
  dplyr::mutate(timepoint = rep(c(1,2,3), each = 1, len = nrow(iris))) %>% 
  ggplot2::ggplot(aes(x = interaction(Species, timepoint), y = Petal.Width)) + 
  ggplot2::geom_boxplot(outlier.shape = NA) + 
  ggplot2::geom_line(aes(group = flower), color = "grey") +
  ggbeeswarm::geom_quasirandom(groupOnX = T,
                               size = 4, 
                               pch = 1,
                               aes(fill = Species)) + 
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Rplot01

eclarke commented 3 years ago

Hi @wipperman, thanks for suggesting this. I'm filing it as an enhancement that I'll try to incorporate in the next (very overdue) round of bug fixes.

krassowski commented 2 years ago

Adding position=ggbeeswarm::position_quasirandom() works well for me, I don't think any changes apart from adding an example are needed. An example:

library(ggplot2)
mpg$after2000 = mpg$year > 2000
data = aggregate(
    hwy ~ manufacturer + model + after2000 + drv,
    mpg,
    mean
)
(
    ggplot(data, aes(x=after2000, y=hwy))
    + geom_violin()
    + geom_line(
       aes(group=interaction(manufacturer, model)),
       color='grey',
       position=ggbeeswarm::position_quasirandom()
    )
    + ggbeeswarm::geom_quasirandom(aes(color=drv))
    + scale_color_discrete(
        labels=c(
            'f'='front-wheel drive',
            'r'='rear-wheel drive',
            '4'='four-weel drive'
        ),
        name='the type of drive train'
    )
    + ylab('highway miles per gallon')
    + theme_bw()
)

image

lnalborczyk commented 1 year ago

Hi, thank you for the tip! It does not seem to work though when adding a non-null dodge.width argument to geom_quasirandom()... For instance, a pseudocode like:

geom_quasirandom(
    dodge.width = 0.5,
    size = 2,
    alpha = 0.25,
    show.legend = FALSE
    ) +
geom_line(
    data = . %>% filter(mode != "Control"),
    aes(group = interaction(participant, syllable) ),
    size = 0.5,
    alpha = 0.25,
    show.legend = FALSE,
    position = ggbeeswarm::position_quasirandom(dodge.width = 0.5)
    ) +

Results in a plot like:

Rplot

Any idea on how to "connect the dots"?

Thanks!

Ladislas

eclarke commented 1 year ago

Hi @lnalborczyk, I looked into this and I believe that the issue is that defining a group aesthetic changes how the density calculation (and therefore the position) is performed. So in your example, the calculated 'group' being used to distribute the points is the combined x+color aesthetics, whereas the calculated 'group' for the line is (participant+syllable+x+color). Consequently, the position calculation for the lines is altered.

The reason that previous example works and yours doesn't is because specifying dodge.width triggers geom_quasirandom to adjust position by calculated 'group' (not necessarily the group aesthetic). If dodge.width is NULL, it instead just groups by the x aesthetic. This is essentially the equivalent of adding position_dodge(width=...).

On that note, I looked to see if I could get one of the base ggplot2 position functions like jitter/dodge/jitterdodge to do what you're looking for and I couldn't find a workable solution (though I might have overlooked it). The suggestion was to consider a faceting approach, which I think would work but certainly wouldn't look as clean as what you're envisioning.

gibson-amandag commented 1 year ago

Hello! I recently discovered this package, and I'm very appreciative of it's functionality. I'm trying to connect points with a geom_line, and I'm having trouble getting the line to follow the quasirandom position of the points. For example, when I try the example provided above, the lines are all positioned in the center.

I believe that I'm running version 0.7.2

library(ggplot2)
mpg$after2000 = mpg$year > 2000
data = aggregate(
    hwy ~ manufacturer + model + after2000 + drv,
    mpg,
    mean
)
(
    ggplot(data, aes(x=after2000, y=hwy))
    + geom_violin()
    + geom_line(
       aes(group=interaction(manufacturer, model)),
       color='grey',
       position=ggbeeswarm::position_quasirandom()
    )
    + ggbeeswarm::geom_quasirandom(aes(color=drv))
    + scale_color_discrete(
        labels=c(
            'f'='front-wheel drive',
            'r'='rear-wheel drive',
            '4'='four-weel drive'
        ),
        name='the type of drive train'
    )
    + ylab('highway miles per gallon')
    + theme_bw()
)

testLineQuairandom

krassowski commented 1 year ago

@gibson-amandag I can confirm that this is a regression. Your code works well in 0.6.0 but fails in 0.7.1 and 0.7.2

0.6.0 0.7.2
image image

A workaround would be using geom_segment but it only works partially. It is kind of giving up on the second half:

image

library(ggplot2)
mpg$after2000 = mpg$year > 2000
data = aggregate(
    hwy ~ manufacturer + model + after2000 + drv,
    mpg,
    mean
)
(
    ggplot(data, aes(x=after2000, y=hwy))
    + geom_violin()
    + geom_segment(
       data=unstack(data, hwy ~ after2000),
       aes(x=FALSE, xend=TRUE, y=FALSE., yend=TRUE.),
       color='grey',
       position=ggbeeswarm::position_quasirandom()
    )
    + ggbeeswarm::geom_quasirandom(aes(color=drv))
    + scale_color_discrete(
        labels=c(
            'f'='front-wheel drive',
            'r'='rear-wheel drive',
            '4'='four-weel drive'
        ),
        name='the type of drive train'
    )
    + ylab('highway miles per gallon')
    + theme_bw()
)
krassowski commented 1 year ago

Here is a monkeypatch which fixes the geom_segment approach in version 0.7.2:

ggbeeswarm <- getNamespace("ggbeeswarm")
unlockBinding("offset_quasirandom", ggbeeswarm)

ggbeeswarm$offset_quasirandom <- function(
  data,
  width = 0.4,
  vary.width = FALSE,
  max.length = NULL,
  ...
) {
  x.offset <- vipor::aveWithArgs(
    data$y, data$x,
    FUN = vipor::offsetSingleGroup,
    maxLength = if (vary.width) {max.length} else {NULL},
    ...
  )

  x.offset <- x.offset * width
  data$x <- data$x + x.offset

  if ('xend' %in% colnames(data) && 'yend' %in% colnames(data)) {
      x.offset <- vipor::aveWithArgs(
        data$yend, data$xend,
        FUN = vipor::offsetSingleGroup,
        maxLength = if (vary.width) {max.length} else {NULL},
        ...
      )

      x.offset <- x.offset * width
      data$xend <- data$xend + x.offset
  }
  data
}
lockBinding("offset_quasirandom", ggbeeswarm)

image

eclarke commented 1 year ago

Hi @gibson-amandag, thanks for the bug report and reprex and apologies for the delay in responding. @krassowski, thanks for figuring out a monkeypatch- I'll look into incorporating it into the package.

krassowski commented 1 year ago

PR is ready :) But if you find it easier to rework it feel free to close - no hard feelings.

Laurent-ZT commented 8 months ago

Hi,

Thank you for this great function, that I have been using.

I have have the same issue of lines being disconnected from points (I am using geom_line since there are various points along my lines). I would love to see it working in future version (although I understand that this may not be your priority), Thanks

roaldarbol commented 3 months ago

Quick follow-up question - and I haven't tested this: The new implementation of position_pseudorandom() in geom_segment() allows this now (#89). Is it currently possible to do exactly the same with position_beeswarm(), or does that need a new PR?