brunobrr / bdc

Check out the vignettes with detailed documentation on each module of the bdc package
https://brunobrr.github.io/bdc
GNU General Public License v3.0
23 stars 7 forks source link
bdc biodiversity-data workflow

bdc

A toolkit for standardizing, integrating, and cleaning biodiversity data

CRAN
status downloads

R-CMD-check Codecov test
coverage DOI License

Overview

Handle biodiversity data from several different sources is not an easy task. Here, we present the Biodiversity Data Cleaning (bdc), an R package to address quality issues and improve the fitness-for-use of biodiversity datasets. bdc contains functions to harmonize and integrate data from different sources following common standards and protocols, and implements various tests and tools to flag, document, clean, and correct taxonomic, spatial, and temporal data.

Compared to other available R packages, the main strengths of the bdc package are that it brings together available tools – and a series of new ones – to assess the quality of different dimensions of biodiversity data into a single and flexible toolkit. The functions can be applied to a multitude of taxonomic groups, datasets (including regional or local repositories), countries, or worldwide.

Structure of bdc

The bdc toolkit is organized in thematic modules related to different biodiversity dimensions.


:warning: The modules illustrated, and functions within, were linked to form a proposed reproducible workflow (see vignettes). However, all functions can also be executed independently.



1. Merge databases

Standardization and integration of different datasets into a standard database.

2. Pre-filter

Flagging and removal of invalid or non-interpretable information, followed by data amendments (e.g., correct transposed coordinates and standardize country names).

3. Taxonomy

Cleaning, parsing, and harmonization of scientific names against multiple taxonomic references.

4. Space

Flagging of erroneous, suspicious, and low-precision geographic coordinates.

5. Time

Flagging and, whenever possible, correction of inconsistent collection date.

Other functions

Aim to facilitate the documentation, visualization, and interpretation of results of data quality tests the package contains functions for documenting the results of the data-cleaning tests, including functions for saving i) records needing further inspection, ii) figures, and iii) data-quality reports.

Installation

Gnparser installation

Previously to bdc installation is necessary to install GNparser. First, download the binary file of gnparser for your operational system. For example, download the file using R as follow:


download.file(url = "file_link", 
              destfile = "destination_path")

The downloaded file has extensions .zip or .gz.

Mac OS

Extract the binary file gnparser from .zip or .gz files and move it to the folder ~/Library/Application Support/. Move the file manually or using R:


# Extract gnparser file
untar("~/Downloads/gnparser-v1.9.1-linux.tar.gz")

# Move to the path
file.copy("./gnparser", "~/Library/Application Support/")

Linux

Extract the binary file gnparser from .zip or .gz files and move it to the folder ~/bin. Move the file manually or using R:


# Extract gnparser file
untar("~/Downloads/gnparser-v1.9.1-linux.tar.gz")

# Move to the path
file.copy("./gnparser",  "~/bin")

Windows

In Windows, extract the binary file gnparser from .zip. Then, move gnparser file to the folder Appdata. To find the Appdata path, run this in R:


# Unzip the downloaded file
unzip(gnparser.zip, exdir = "destination_path/gnparser")

# Find the AppData path
AppData_path <- Sys.getenv("AppData")

# Copy gnparser to AppData

file.copy("destination_path/gnparser", AppData_path, recursive = TRUE)

bdc installation

After installing Gnparser, you can install bdc from CRAN:

install.packages("bdc")
library(taxadb)

or the development version from GitHub using:

install.packages("remotes")
remotes::install_github("brunobrr/bdc")

Load the package with:

library(bdc)

Package website

See bdc package website (https://brunobrr.github.io/bdc/) for detailed explanation on each module.

Getting help

If you encounter a clear bug, please file an issue here. For questions or suggestion, please send us a email (ribeiro.brr@gmail.com).

Citation

Ribeiro, BR; Velazco, SJE; Guidoni-Martins, K; Tessarolo, G; Jardim, Lucas; Bachman, SP; Loyola, R (2022). bdc: A toolkit for standardizing, integrating, and cleaning biodiversity data. Methods in Ecology and Evolution. doi.org/10.1111/2041-210X.13868