datacarpentry / R-ecology-lesson

Data Analysis and Visualization in R for Ecologists
https://datacarpentry.org/R-ecology-lesson/
Other
314 stars 508 forks source link

"Renaming factors" section does not reflect true results of running these commands #728

Closed rkmeade closed 3 years ago

rkmeade commented 3 years ago

Hi Maintainers!

A quick comment on the "Renaming factors" section, which does not work on my console the same way the episode says that it should.

Beginning with the first command, plot(surveys$sex), my console plots the ~1700 missing values as their own column (which is not reflected on the plot in the episode). I believe this is because instead of NA, it recognizes a third category of values, designated "".

In the next set of commands: sex <- surveys$sex levels(sex)

The lesson says the output should be: [1] "F" "M"

This is what I get: [1] "" "F" "M"

In the next code block, a new category for missing values is added: sex <- addNA(sex) levels(sex)

The lesson says the output should be: [1] "F" "M" NA

My output now has two equivalents of missing values: [1] "" "F" "M" NA

I believe all downstream errors can be remediated by running this before the initial plot command: levels(sex)[1] <- NA

I hope this is helpful!

-- Rachel

Teebusch commented 3 years ago

Hi @rkmeade, thank you for raising this issue. Using the code from the lesson, it runs as expected. See reproducible example below. However, I can replicate your issue by using read.csv() (base R) instead of read_csv() (tidyverse, used in the lesson). This is an easy to make mistake that has been brought up a few times (e.g., #710). We could probably do a better job at preventing this.

Correct output, using read_csv()

## Loading the survey data
# modified slightly, for reprex to work
library(tidyverse)
surveys <- read_csv("https://ndownloader.figshare.com/files/2292169")
#> 
#> -- Column specification --------------------------------------------------------
#> cols(
#>   record_id = col_double(),
#>   month = col_double(),
#>   day = col_double(),
#>   year = col_double(),
#>   plot_id = col_double(),
#>   species_id = col_character(),
#>   sex = col_character(),
#>   hindfoot_length = col_double(),
#>   weight = col_double(),
#>   genus = col_character(),
#>   species = col_character(),
#>   taxa = col_character(),
#>   plot_type = col_character()
#> )

# ...

## Factors
surveys$sex <- factor(surveys$sex)

# ...

### Renaming factors

plot(surveys$sex)

sex <- surveys$sex
levels(sex)
#> [1] "F" "M"
sex <- addNA(sex)
levels(sex)
#> [1] "F" "M" NA
head(sex)
#> [1] M    M    <NA> <NA> <NA> <NA>
#> Levels: F M <NA>
levels(sex)[3] <- "undetermined"
levels(sex)
#> [1] "F"            "M"            "undetermined"
head(sex)
#> [1] M            M            undetermined undetermined undetermined
#> [6] undetermined
#> Levels: F M undetermined
plot(sex)


Unexpected output, using read.csv()

## Loading the survey data
# modified slightly, for reprex to work
library(tidyverse)
surveys <- read.csv("https://ndownloader.figshare.com/files/2292169")

# ...

## Factors
surveys$sex <- factor(surveys$sex)

# ...

### Renaming factors

plot(surveys$sex)

sex <- surveys$sex
levels(sex)
#> [1] ""  "F" "M"
sex <- addNA(sex)
levels(sex)
#> [1] ""  "F" "M" NA
head(sex)
#> [1] M M        
#> Levels:  F M <NA>
levels(sex)[3] <- "undetermined"
levels(sex)
#> [1] ""             "F"            "undetermined"
head(sex)
#> [1] undetermined undetermined                                       
#> [6]             
#> Levels:  F undetermined
plot(sex)

Created on 2021-07-05 by the reprex package (v2.0.0)

Session info ``` r sessioninfo::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.0.4 (2021-02-15) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United Kingdom.1252 #> ctype English_United Kingdom.1252 #> tz Europe/Paris #> date 2021-07-05 #> #> - Packages ------------------------------------------------------------------- #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2) #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3) #> broom 0.7.6 2021-04-05 [1] CRAN (R 4.0.4) #> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.2) #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.4) #> colorspace 2.0-1 2021-05-04 [1] CRAN (R 4.0.5) #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.4) #> curl 4.3.1 2021-04-30 [1] CRAN (R 4.0.5) #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3) #> dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.0.5) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3) #> dplyr * 1.0.6 2021-05-05 [1] CRAN (R 4.0.4) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2) #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3) #> forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.0.3) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2) #> ggplot2 * 3.3.3 2020-12-30 [1] CRAN (R 4.0.3) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2) #> haven 2.4.1 2021-04-23 [1] CRAN (R 4.0.5) #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.4) #> hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.3) #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2) #> jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.3) #> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.5) #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4) #> lubridate 1.7.10 2021-02-26 [1] CRAN (R 4.0.4) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3) #> mime 0.10 2021-02-13 [1] CRAN (R 4.0.4) #> modelr 0.1.8 2020-05-19 [1] CRAN (R 4.0.2) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2) #> pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.5) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2) #> ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.5) #> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.0.2) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2) #> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.3) #> readr * 1.4.0 2020-10-05 [1] CRAN (R 4.0.3) #> readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.2) #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5) #> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.5) #> rmarkdown 2.8 2021-05-07 [1] CRAN (R 4.0.5) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.3) #> rvest 1.0.0 2021-03-09 [1] CRAN (R 4.0.4) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2) #> stringi 1.6.1 2021-05-10 [1] CRAN (R 4.0.4) #> stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.0.2) #> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.4) #> tibble * 3.1.1 2021-04-18 [1] CRAN (R 4.0.5) #> tidyr * 1.1.3 2021-03-03 [1] CRAN (R 4.0.4) #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.5) #> tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.0.4) #> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.5) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5) #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.4) #> xfun 0.22 2021-03-11 [1] CRAN (R 4.0.4) #> xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.2) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2) #> #> [1] C:/Users/teebu/Rlib #> [2] C:/Program Files/R/R-4.0.4/library ```