update data sets used in examples

rpruim commented 4 years ago

Go through all examples and see whether they can be improved.

[ ] give the examples a comprehensive look for value, consistency, etc.
[x] eliminate iris
[ ] eliminate other not-so-compelling choices
[ ] eliminate examples that are not so useful
[ ] add examples for features that could use an example but don't have one.

rpruim commented 4 years ago

Possible replacement for iris: height ~ mother, color = ~ sex, data = Galton. The separation isn't as interesting, and there are only 2 groups rather than 3, but it does illustrate some clustering (but mainly just because women are shorter).

I'd be willing to add a data set to mosaicData if there were a better example.

nicholasjhorton commented 4 years ago

Can we avoid Galton? More eugenicists would not be my preference.

All the best,

Nick

On Jul 13, 2020, at 5:56 PM, Randall Pruim notifications@github.com wrote:

Possible replacement for iris: height ~ mother, color = ~ sex, data = Galton. The separation isn't as interesting, and there are only 2 groups rather than 3, but it does illustrate some clustering (but mainly just because women are shorter).

I'd be willing to add a data set to mosaicData if there were a better example.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or unsubscribe.

rpruim commented 4 years ago

I just emailed you about the same thing. I'm happy to use another data set -- but we need to find one.

nicholasjhorton commented 4 years ago

What about palmerpenguins?

#remotes::install_github("allisonhorst/palmerpenguins")
suppressPackageStartupMessages(library(mosaic))
library(palmerpenguins)
gf_point(bill_length_mm ~ bill_depth_mm, color = ~ species, data = penguins)
#> Warning: Removed 2 rows containing missing values (geom_point).

^{Created on 2020-07-13 by the reprex package (v0.3.0)}

rpruim commented 4 years ago

That example is good but introduces YAD (yet another dependency).

rpruim commented 4 years ago

Also: not yet on CRAN. So for now, this is a no go.

rpruim commented 4 years ago

Looks like there is a version of the penguins data set in modeldata::penguins on CRAN. The help there indicates that the data are snapshot from the palmerpenguins package.

Note: modeldata doesn't (currently) use lazy data (https://github.com/tidymodels/modeldata/issues/4), so an explicit data(penguins) is required. Also the documentation is pretty minimal in modeldata.

Alternatively, we could add our own version of the data set to mosaicData, perhaps renaming the variables and adding labels.

Thoughts?

rpruim commented 4 years ago

suppressPackageStartupMessages(library(ggformula))
suppressPackageStartupMessages(library(modeldata))
data(penguins)
names(penguins)
#> [1] "species"           "island"            "bill_length_mm"   
#> [4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
#> [7] "sex"
gf_point(bill_length_mm ~ bill_depth_mm | island, color = ~ species, data = penguins)
#> Warning: Removed 2 rows containing missing values (geom_point).

gf_density_2d(bill_length_mm ~ bill_depth_mm | island, color = ~ species, data = penguins)
#> Warning: Removed 2 rows containing non-finite values (stat_density2d).

gf_density_ridges(species ~ bill_length_mm | island ~ ., fill = ~ species, data = penguins)
#> Picking joint bandwidth of 1.01
#> Picking joint bandwidth of 1.14
#> Picking joint bandwidth of 1.24
#> Warning: Removed 2 rows containing non-finite values (stat_density_ridges).

^{Created on 2020-07-13 by the reprex package (v0.3.0)}

nicholasjhorton commented 4 years ago

That looks great!

rpruim commented 4 years ago

See https://github.com/tidymodels/modeldata/issues/4 for discussion of using lazy data in the model data package. If we hear that this will happen soon, I'd be inclined to wait for that.

rpruim commented 4 years ago

I've just converted all iris examples to palmerpenguins::penguins since palmerpenguins is now on CRAN.

ProjectMOSAIC / mosaic

update data sets used in examples #762