Closed ggrothendieck closed 10 months ago
I cannot imagine a way of doing the selection within the ggplot code without modifying data. I am not an expert of how facets are implemented, but in the grammar of graphics the data from layers is not expected to be visible outside the layer. Faceting is as far as I know always dependent on a variable in the argument passed to data
in the call to ggplot()
, not data returned by statistics.
If that is not possible what about running ggplot
twice with the first instance generating all panels and the second instance only generating the R squared >= 0.97 panels making use of the computations done in the first. The idea would be that the two ggplot
instances would be nearly the same making the coding simpler.
The code in ggplot2 statistics runs when the plot is rendered into graphical objects, not before. What is it wrong with the approach of subsetting the data before plotting? Furthermore, not plotting all data in most cases would mislead the viewer.
The subset-ing is not terrible but I was hoping to simplify the code by eliminating the entire first pipeline. Also, in reality there could be many panels yet interest is only on the high R^2 panels. Here is a slightly better example. It generates 12 panels if not cut down but it only generates 4 if the filtering is done so it is easier to focus on what is relevant. Anyways, will stick with my current solution or look around a bit more for alternatives.
library(broom)
library(dplyr)
library(ggplot2)
library(ggpmisc)
# find Plants for which R squared >= 0.60
Plants <- CO2 %>%
nest_by(Plant) %>%
summarize(model = list(lm(uptake ~ conc, data)), glance(model)) %>%
filter(r.squared >= 0.60) %>%
pull(Plant)
# plot
if (length(Plants)) {
p <- CO2 %>%
filter(Plant %in% Plants) %>% # omit this line to see all panels
ggplot(aes(conc, uptake)) +
geom_point() +
stat_poly_eq() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ Plant)
plot(p)
}
I have found ggplot_build
and %+%
and now have this solution to plotting only panels with R^2 > 0.6 . I assume ggpmisc put the R squared values there. Maybe it could provide an extraction function so one could simplify the ugly line that below ends with ## .
library(broom)
library(dplyr)
library(ggplot2)
library(ggpmisc)
p <- CO2 %>%
ggplot(aes(conc, uptake)) +
geom_point() +
stat_poly_eq() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ Plant)
plot(p)
}
Plants <- levels(CO2$Plant)[ggplot_build(p)$data[[2]]$r.squared > .6] ##
if (length(Plants)) p %+% filter(CO2, Plant %in% Plants)
If the function were called get.r.squared
, say, then
get.r.squared <- function(p) ggplot_build(p)$data[[2]]$r.squared
Then we could write the line marked ## above as
Plants <- levels(CO2$Plant)[get.r.squared(p) > .6]
If a neat way is created to help with the filtering of facets, it would be useful to provide a message of how many facets there were originally, and how many are returned - this deals a little to Pedro's concern of misleading results.
Would like to be able to show only the facets with high R squared. The code below does it but it would be easier if it could be done entirely within ggplot2, i.e. within the second pipeline. In particular we ran
lm
in the first pipeline and then again, implicitly, in the second pipeline. Also more important is having simpler code would be nice. A variation would be to show only the top k panels in R squared where k is specified.