bmaitner / RBIEN

Tools for accessing the Botanical Information and Ecology Network (BIEN) database
http://bien.nceas.ucsb.edu/bien/
Other
43 stars 10 forks source link

Comparison BIEN4/GBIF #34

Open basille opened 2 years ago

basille commented 2 years ago

Hey @bmaitner!

Following our conversation a couple of weeks ago, I just take time now to provide a comparison (with example) between BIEN4 and GBIF data, of course using the two relevant R packages. I'll take the sycamore maple (Acer pseudoplatanus) for the illustration, although it's probably irrelevant. Here we go:

BIEN4 occurrence data

Note: This comes from my own records from a few days ago, as BIEN servers seem unresponsive as of today (The BIEN servers are currently undergoing updates and may be slower than usual at present.).

Information about BIEN:

library("BIEN")
BIEN_metadata_database_version()
  db_version db_release_date
1      4.2.5      2021-12-07

Get the data:

acps_bien <- BIEN_occurrence_species("Acer pseudoplatanus", 
    native.status = TRUE, 
    political.boundaries = TRUE)
dim(acps_bien)
[1] 1699   22

Only data after 1990:

acps_bien$date_collected <- lubridate::ymd(acps_bien$date_collected)
acps_bien <- subset(acps_bien, date_collected > lubridate::ymd("1990-01-01"))
dim(acps_bien)
[1] 728  22

Convert to sf class for mapping:

acps_bien <- st_as_sf(acps_bien, coords = c("longitude", "latitude"), remove = FALSE,
    crs = 4326, agr = "constant")
ggplot(data = world) +
    geom_sf(color = gray(.5), fill= "antiquewhite") +
    geom_sf(data = acps_bien, size = .1, alpha = .2, col = "brown3") +
    coord_sf(xlim = c(2.5e6, 7e6), ylim = c(1.3e6, 5.3e6), crs = st_crs(3035)) +
    labs(
        x = "Longitude",
        y = "Latitude",
        title = acps_nom_scient,
        subtitle = "Données BIEN"
    ) +
    theme(
        panel.grid.major = element_line(color = gray(.7),
        linetype = "dashed", size = 0.5),
        panel.background = element_rect(fill = "aliceblue"),
        plot.title = element_text(face = "italic")
    )

acps-bien-carte-1

GBIF occurrence data and comparison

Prepare the query and download the data:

library("rgbif")
acps_gbif_dl <- occ_download(
    pred("taxonKey", name_backbone(name = "Acer pseudoplatanus", rank = "species")$speciesKey), # Main key
    pred("hasGeospatialIssue", FALSE), # Remove default geospatial issues
    pred("hasCoordinate", TRUE),       # Keep only records with coordinates
    pred("occurrenceStatus","PRESENT"), # Remove absent records
    pred_not(pred_in("basisOfRecord",c("FOSSIL_SPECIMEN","LIVING_SPECIMEN"))), # Remove fossils and living specimens (zoo/botanical garden)
    pred_and( # Between 1990–2020 (both included)
        pred_gte("year", "1990"),
        pred_lte("year", "2020")),
    format = "SIMPLE_CSV"
)
occ_download_wait(acps_gbif_dl)
acps_gbif <- occ_download_get(acps_gbif_dl, path = "Data/gbif-acps/", overwrite = TRUE) |>
    occ_download_import()

Remove non-commercial data and check the resulting data:

acps_gbif <- subset(acps_gbif, license != "CC_BY_NC_4_0")
dim(acps_gbif)
[1] 387557  50

Convert to sf class for mapping:

acps_gbif <- st_as_sf(acps_gbif, coords = c("decimalLongitude", "decimalLatitude"),
    remove = FALSE, crs = 4326, agr = "constant")
ggplot(data = world) +
    geom_sf(color = gray(.5), fill= "antiquewhite") +
    geom_sf(data = acps_gbif, size = .1, alpha = .05, col = "brown3") +
    coord_sf(xlim = c(2.5e6, 7e6), ylim = c(1.3e6, 5.3e6), crs = st_crs(3035)) +
    labs(
        x = "Longitude",
        y = "Latitude",
        title = acps_nom_scient,
        subtitle = "Données GBIF"
    ) +
    theme(
        panel.grid.major = element_line(color = gray(.7),
        linetype = "dashed", size = 0.5),
        panel.background = element_rect(fill = "aliceblue"),
        plot.title = element_text(face = "italic")
    )

acps-cartes-1

Summary

There is a striking difference between the two datasets, even after removing a bunch of data with non-commercial restrictions (728 vs. 387557 records).