Open rpruim opened 4 years ago
Possible replacement for iris: height ~ mother, color = ~ sex, data = Galton
. The separation isn't as interesting, and there are only 2 groups rather than 3, but it does illustrate some clustering (but mainly just because women are shorter).
I'd be willing to add a data set to mosaicData
if there were a better example.
Can we avoid Galton? More eugenicists would not be my preference.
All the best,
Nick
On Jul 13, 2020, at 5:56 PM, Randall Pruim notifications@github.com wrote:
Possible replacement for iris: height ~ mother, color = ~ sex, data = Galton. The separation isn't as interesting, and there are only 2 groups rather than 3, but it does illustrate some clustering (but mainly just because women are shorter).
I'd be willing to add a data set to mosaicData if there were a better example.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or unsubscribe.
I just emailed you about the same thing. I'm happy to use another data set -- but we need to find one.
What about palmerpenguins
?
#remotes::install_github("allisonhorst/palmerpenguins")
suppressPackageStartupMessages(library(mosaic))
library(palmerpenguins)
gf_point(bill_length_mm ~ bill_depth_mm, color = ~ species, data = penguins)
#> Warning: Removed 2 rows containing missing values (geom_point).
Created on 2020-07-13 by the reprex package (v0.3.0)
That example is good but introduces YAD (yet another dependency).
Also: not yet on CRAN. So for now, this is a no go.
Looks like there is a version of the penguins data set in modeldata::penguins
on CRAN. The help there indicates that the data are snapshot from the palmerpenguins package.
Note: modeldata doesn't (currently) use lazy data (https://github.com/tidymodels/modeldata/issues/4), so an explicit data(penguins)
is required. Also the documentation is pretty minimal in modeldata.
Alternatively, we could add our own version of the data set to mosaicData, perhaps renaming the variables and adding labels.
Thoughts?
suppressPackageStartupMessages(library(ggformula))
suppressPackageStartupMessages(library(modeldata))
data(penguins)
names(penguins)
#> [1] "species" "island" "bill_length_mm"
#> [4] "bill_depth_mm" "flipper_length_mm" "body_mass_g"
#> [7] "sex"
gf_point(bill_length_mm ~ bill_depth_mm | island, color = ~ species, data = penguins)
#> Warning: Removed 2 rows containing missing values (geom_point).
gf_density_2d(bill_length_mm ~ bill_depth_mm | island, color = ~ species, data = penguins)
#> Warning: Removed 2 rows containing non-finite values (stat_density2d).
gf_density_ridges(species ~ bill_length_mm | island ~ ., fill = ~ species, data = penguins)
#> Picking joint bandwidth of 1.01
#> Picking joint bandwidth of 1.14
#> Picking joint bandwidth of 1.24
#> Warning: Removed 2 rows containing non-finite values (stat_density_ridges).
Created on 2020-07-13 by the reprex package (v0.3.0)
That looks great!
See https://github.com/tidymodels/modeldata/issues/4 for discussion of using lazy data in the model data package. If we hear that this will happen soon, I'd be inclined to wait for that.
I've just converted all iris
examples to palmerpenguins::penguins
since palmerpenguins
is now on CRAN.
Go through all examples and see whether they can be improved.