NEONScience / NEON-utilities

Utilities and scripts for working with NEON data. Currently: an R package with functions to join (stack) the month-by-site files in downloaded NEON data, to convert data to geoCSV format, and to download data from the API.
GNU Affero General Public License v3.0
57 stars 36 forks source link

loadByProduct() returns incorrect encoding for Algae taxa, probably other taxonomic groups as well #88

Closed sokole closed 4 years ago

sokole commented 4 years ago

Function loadByProduct

Describe the bug When algae data with taxonomy are downloaded this way, scientificNames with non-standard characters have the incorrect encoding. For example, in alg_taxonomyProcessed, when the taxonID 'NEONDREX6001' is returned in the data set, the scientificName should be "Amphipleura pellucida (Kützing) Kützing", but the function returns "Amphipleura pellucida (Kützing) Kützing"

To Reproduce

#packages used
library(dplyr)
library(neonUtilities)

# in 
my_dpid <- "DP1.20120.001"

metadata_all <- neonUtilities::getProductInfo(my_dpid)

landing_page_url <- paste0("https://data.neonscience.org/data-products/", my_dpid)

#################

# Try this:
# only 2 sites, restricted in time
all_tabs_in <- neonUtilities::loadByProduct(
  dpID = "DP1.20166.001", 
  site = c("MAYF", "PRIN"),
  startdate = "2016-1", 
  enddate = "2018-11",  
  package = "expanded", 
  check.size = FALSE)

# view scientificName for taxonID 'NEONDREX6001'
all_tabs_in$alg_taxonomyRaw %>% filter(taxonID == 'NEONDREX6001') %>% slice(1) %>% select(scientificName)

# what I see:
# Amphipleura pellucida (Kützing) Kützing

Expected behavior The above script should return "Amphipleura pellucida (Kützing) Kützing"

System (please complete the following information):

Additional context I think the scientificNames are correct in the database. When I use restR to lookup scientificNames from the taxonID, they have the correct encoding and format. So I think it's an issue in extracting the text strings from the xml?

sokole commented 4 years ago

I can try to tackle this soonish. Mainly posting here to document the issue.

cklunch commented 4 years ago

Thanks @sokole ! This was an easy fix in readTableNEON(), fixed on GitHub now.

cklunch commented 4 years ago

This fix is now on CRAN.