bmaitner / RBIEN

Tools for accessing the Botanical Information and Ecology Network (BIEN) database
http://bien.nceas.ucsb.edu/bien/
Other
41 stars 10 forks source link

Some species names are not capitalized #24

Open Rekyt opened 3 years ago

Rekyt commented 3 years ago

When retrieving species list from their range, I realized by matching the species list to another one, that some species names were not capitalized. Here is a reprex:

library("dplyr")
#> Warning: le package 'dplyr' a été compilé avec la version R 4.0.3
#> 
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

BIEN::BIEN_ranges_box(
  min.lat = -55.61831, max.lat = 83.64513,
  min.long = -171.79111, max.long = -12.20855,
  species.names.only = TRUE) %>%
  filter(substr(species, 1, 1) %in% letters)
#>                  species
#> 1  chamomilla_chamomilla
#> 2          dubius_dubius
#> 3           dubius_subsp
#> 4       lachenalii_subsp
#> 5    polytaenium_jenmani
#> 6    syngonanthus_nitens
#> 7             x_Catyclia
#> 8          x_Agrohordeum
#> 9            x_Agropogon
#> 10          x_Elyhordeum
#> 11         x_Elysitanion
#> 12         x_Festulolium
#> 13         x_Pseudelymus
#> 14 xPseudelymus_saxicola
#> 15       x_Stiporyzopsis
#> 16  anthurium_geherrerae

Created on 2020-11-24 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.0.2 (2020-06-22) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate French_France.1252 #> ctype French_France.1252 #> tz Europe/Berlin #> date 2020-11-24 #> #> - Packages ------------------------------------------------------------------- #> package * version date lib source #> ape 5.4-1 2020-08-13 [1] CRAN (R 4.0.3) #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.3) #> BIEN 1.2.4 2020-02-27 [1] CRAN (R 4.0.3) #> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.3) #> class 7.3-17 2020-04-26 [2] CRAN (R 4.0.2) #> classInt 0.4-3 2020-04-07 [1] CRAN (R 4.0.3) #> cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.3) #> codetools 0.2-18 2020-11-04 [1] CRAN (R 4.0.3) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.3) #> DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.3) #> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.3) #> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.3) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3) #> doParallel 1.0.16 2020-10-16 [1] CRAN (R 4.0.3) #> dplyr * 1.0.2 2020-08-18 [1] CRAN (R 4.0.3) #> e1071 1.7-4 2020-10-14 [1] CRAN (R 4.0.3) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.3) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.3) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.3) #> fasterize 1.0.3 2020-07-27 [1] CRAN (R 4.0.3) #> foreach 1.5.1 2020-10-15 [1] CRAN (R 4.0.3) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.3) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.3) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.3) #> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.3) #> iterators 1.0.13 2020-10-15 [1] CRAN (R 4.0.3) #> KernSmooth 2.23-18 2020-10-29 [1] CRAN (R 4.0.3) #> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.3) #> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.2) #> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.3) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.3) #> nlme 3.1-150 2020-10-24 [1] CRAN (R 4.0.3) #> pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.2) #> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.3) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.3) #> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.3) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.3) #> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.3) #> ps 1.4.0 2020-10-07 [1] CRAN (R 4.0.3) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.3) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3) #> raster 3.4-5 2020-11-14 [1] CRAN (R 4.0.3) #> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.3) #> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.3) #> rgeos 0.5-5 2020-09-07 [1] CRAN (R 4.0.3) #> rlang 0.4.8 2020-10-08 [1] CRAN (R 4.0.3) #> rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.3) #> RPostgreSQL 0.6-2 2017-06-24 [1] CRAN (R 4.0.3) #> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.2) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.3) #> sf 0.9-6 2020-09-13 [1] CRAN (R 4.0.3) #> sp 1.4-4 2020-10-07 [1] CRAN (R 4.0.3) #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.3) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.3) #> testthat 3.0.0 2020-10-31 [1] CRAN (R 4.0.3) #> tibble 3.0.4 2020-10-12 [1] CRAN (R 4.0.3) #> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.3) #> units 0.6-7 2020-06-13 [1] CRAN (R 4.0.3) #> usethis 1.6.3 2020-09-17 [1] CRAN (R 4.0.3) #> vctrs 0.3.4 2020-08-29 [1] CRAN (R 4.0.3) #> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.3) #> xfun 0.19 2020-10-30 [1] CRAN (R 4.0.3) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.3) #> #> [1] C:/Users/ke76dimu/R/win-library/4.0 #> [2] C:/Program Files/R/R-4.0.2/library ```

of course for the crossing it is expected that the first letter is not a capital letter, but what about the other species?

bmaitner commented 3 years ago

Pinging @ojalaquellueva on this one. Brad, any ideas? Since the mistake is on the maps and is not present in vfoi, presumably this is a range pipeline issue? I also guess this means that the previous set of maps weren't TNRSed?

ojalaquellueva commented 3 years ago

@bmaitner : This is a legacy of how ranges were built in the past before I came on board. Basically the species names in the ranges table needed to match to the basename (minus extension) of the range file as a single string without spaces (hence the underscores). The genus was also inconsistently converted to lower case.

Email me the SQL used to retrieve the species names and range info and I will come up with a solution. If you are querying the new ranges schema, then there is a separate species table with both the internal species name (with underscores and weird capitalization) and the well-formed binomial. Just query that. If you are querying the old ranges table in the main analytical schema, then I can add a column with the well-formed binomial.

Rekyt commented 3 years ago

Thank you both for the quick answer! @ojalaquellueva how to know if I'm querying the old or the new ranges schema? Which table should I consider?

ojalaquellueva commented 3 years ago

@Reykt That was a backend question for Brian. Once I hear back from Brian I'll know whether to push out a fix or if the solution is on Brian's end.