gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Camtrap DP Proof Of Concept #803

Open fmendezh opened 1 year ago

fmendezh commented 1 year ago

Support a basic ingestion test pipeline to publish CamtrapDP from the GBIF IPT https://github.com/gbif/ipt/issues/1829.

This Proof-Of-Concept will use the Camtraptor R function to transform a CamtrapDP into DwC-A, that function has been encapsulated in a Docker container to isolate and facilitate its execution, see https://github.com/gbif/camtrap-dp-pipeline for more details.

This was will require some minor changes to the GBIF Registry, The GBIF Crawler, GBIF Postal Service and GBIF API to handle new endpoint types and extend the current processing pipelines.

In principle, a basic ingestion workflow to ingest CamtrapDP could be:

  1. The GBIF Crawler downloads and extracts a CamtrapDP archive.
  2. A new pipelines process (CLI), via a dockerized Camtraptor server, converts the data package into an expanded DwC-A.
  3. The current pipeline workflow continues handling the archive as a DwC-A.
peterdesmet commented 1 year ago

Nice. Just for clarification: the pipeline would produce:

  1. meta.xml (pending, see https://github.com/inbo/camtraptor/issues/175)
  2. dwc_occurrence.csv
  3. dwc_audubon.csv

A eml.xml would NOT be created. The aim is that metadata registration/updates will be handled by the IPT.