gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Adding new fields to the ABCD > DarwinCore mapping file #1101

Open jholetschek opened 3 weeks ago

jholetschek commented 3 weeks ago

I just had a look at the ABCD 2 > DwC mapping file that defines which ABCD fields are extracted and indexed by GBIF: https://github.com/gbif/pipelines/blob/dev/sdks/tools/archives-converters/src/main/resources/mapping/indexMapping_abcd_2_0_6.properties

I'm wondering how namespace-aware the GBIF harvester is. In the mapping file, only the relative paths below Units/... are given (which where to find the occurrence for DarwinCore). I'm asking because BGBM would love to also provide the IDs for collector and identifier (recordedById and identifiedById). ABCD 2 doesn't have these fields, but ABCD 3. So what would happen if I just provided ABCD 3 elements in an otherwise ABCD 2 document?

The gathering agent (= dwc:collector) node would then look like this:

Image

GatherinAgentsText is already indexed by GBIF, and I'd like to add ResourceURI to the mapping file. Would that work?

What could make things a bit more difficult is the fact that ResourceURI is repeatable in ABCD 3, So I guess there would be code involved. A solution might be to make it non-repeatable in the ABCD 2 document (which would be possible), so it would be provided concatenated as in DarwinCore.

Thanks and Cheers Jörg

timrobertson100 commented 23 hours ago

@fmendezh - can you please comment on the feasibility / timeline for this? @jholetschek has pinged me and indicated this is an important issue to them to know what to do.