gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Updated `write_dwc()` function to process Camtrap DP to Darwin Core Archives #1069

Open peterdesmet opened 1 month ago

peterdesmet commented 1 month ago

@fmendezh our team has updated the write_dwc() function that processes Camtrap DP to Darwin Core Archives. We suggest to use this new function as part of the pipeline to process incoming Camtrap DP datasets.

Changes

The function documentation can be found at https://inbo.github.io/camtrapdp/reference/write_dwc.html

Calling the function

To process a Camtrap DP to a Darwin Core Archive, two functions need to be called (similar as it was in "camtraptor"):

devtools::install_github("inbo/camtrapdp")
library(camtrapdp)

# 1. Not done here: download dataset from IPT + unzip

# 2. Read dataset into memory (via datapackage.json file)
x <- read_camtrapdp("https://raw.githubusercontent.com/tdwg/camtrap-dp/main/example/datapackage.json")

# 3. Convert data to DwC-A
my_dir <- "dwc"
write_dwc(x, directory = "dwc")
#> 
#> ── Transforming data to Darwin Core ──
#> 
#> ── Writing files ──
#> 
#> • 'dwc/dwc_occurrence.csv'
#> • 'dwc/dwc_audiovisual.csv'
#> • 'dwc/meta.xml'

Created on 2024-05-21 with reprex v2.1.0

Questions

  1. Is the pipeline in #803 implemented in production? If so, can #803 be closed?
  2. Can you test the pipeline with the new function and report any issues?
    • I have tested it on a large dataset (https://ipt.gbif-uat.org/resource?r=mica-full) without issues
    • The content of some Darwin Core terms has changed, which might have downstream affects (e.g. license now contains a license code like CC0-1.0)

Once all (potential) issues are resolved, we will release a stable (minor) release of the package. Let me know if you have any questions.