ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

DWC mapping #7348

Open dustymc opened 9 months ago

dustymc commented 9 months ago

The Map

https://docs.google.com/spreadsheets/d/1aCBYX9ErjicL8VdNdHbJUI0JTwWu6L4D_37gJ7IneRY/edit?gid=0#gid=0 will be the primary Arctos-->DWC mapping document; please make suggestions/corrections/etc in this issue.

Mapping Test

Here's a sample of DWC generated from the spreadsheet: temp_dwc_sample.csv.zip

Let me know if you need to see this with some particular data, or what I can do to make things clear.

Goals

A clear and functional DWC mapping document.

Scope

This Issue is for mapping to "flat DWC" (DwC-A). Media/AudubonCore (existing mapping) can be addressed elsewhere. Extensions (new mapping) would also need dedicated Issues and justification. (Because some - perhaps most - don't do much.)

Major Change

@mkoo and I believe mapping should be simplified, where only each "best occurrence" (eg what's in FLAT) is shared via DWC; that's in line with current cataloging practices, will exclude mostly things like lower-quality georeferences, will be a huge simplification in mapping and understanding the data, and will not require us to mint fake identifiers (which make GBIF nervous and might well end up in publications).

working comments

In progress: "translate" SQL (https://github.com/ArctosDB/PG_DDL/blob/master/shared_data/dwc_occurrence.sql) to spreadsheet (in a way that can be used to write dynamic SQL).

I'll merge related issues here so they can be addressed in context. It'll take a while.

Some possibly-related issues: https://github.com/ArctosDB/arctos/issues?q=is%3Aissue+is%3Aopen+label%3A%22Aggregator+issues%22

Jegelewicz commented 9 months ago

Are we mapping to DwC-A or GBIF's new model?

campmlc commented 9 months ago

I'm interested in participating.

ekrimmel commented 9 months ago

I am interested in participating, if y'all will have me!

Jegelewicz commented 9 months ago

Questions sent to @dbloom about publishing under the GBIF new model.

  1. Is anyone actually doing this?
  2. If so, where do I find the schema? I can find use cases and make guesses, but I feel like there must be some expected format similar to the DwC-A
  3. If Arctos decided to publish under the new model, would it be problematic for the VertNet IPT?

Resources https://www.gbif.org/new-data-model

tucotuco commented 9 months ago
  1. Is anyone actually doing this?

The implementation of the GBIF data model to date is experimental. We are going through distinct use cases that cover various different realms of biodiversity data. Material was the first one, and we feel that we learned enough to cover that realm fairly well, but the publishing model is not for implementation yet. GBIF is not in a position to consume and aggregate those data wholesale yet. It is not the calendar to do yet. Arctos is in a great position to be able to do it when it becomes enabled. They were at the core of the model design on two occasions.

  1. If so, where do I find the schema? I can find use cases and make guesses, but I feel like there must be some expected format similar to the DwC-A

Though it isn't ready to be implemented, the model as used in the Material Collections "experiment" is the closest to the eventual underlying model at GBIF as there is. I will expect Arctos will want to publish something close to this model, because the publishing model(s) will be simpler and would require unnecessary work to map to something less rich than what you see in that link.

  1. If Arctos decided to publish under the new model, would it be problematic for the VertNet IPT?

I expect that if Arctos uses the underlying model to map to, the VertNet IPT would be irrelevant for GBIF, as the IPT would only be able to support publishing models, not the underlying model. If Arctos ends up using a Material publishing model, it would be enabled in the VertNet IPT.

I hope that helps. I'm open to whatever questions.

Resources https://www.gbif.org/new-data-model

AJLinn commented 8 months ago

Can I ask a basic-level question?

Is DWC only relevant for non-cultural collections? Do we who manage cultural items in Arctos need to be engaged in these discussions for any potential impact to Arctos field names and/or functionality?

Jegelewicz commented 8 months ago

potential impact to Arctos field names and/or functionality?

@AJLinn this will have no impact on Arctos field names and/or functionality. We will be looking at how the fields in Arctos are mapped to Darwin Core for publishing to GBIF. If you want to publish your collections (which might be cool), then you might also be interested.

Jegelewicz commented 8 months ago

Also, response from Dave.

  1. No. Not publicly. There has been some testing, but the new data model is just a model.
  2. You already have as much information as pretty much everyone. There is no public schema because it hasn't been completed.
  3. Nobody can use any IPT to publish with the new data model because A) see #2 and B) the IPT hasn't been modified and released to utilize the new model. So you may be eager, but it isn't possible to do yet.

There is nothing you have failed to ask. You just happen to be anticipating something that isn't real yet - at least not at scale or in any public way. You can review the Work Programme for 2024 - https://docs.gbif.org/2024-work-programme/en/#priority4. In it you will see that it might be 2027 before the new model is fully formed and ready for wide-spread use (see section 4.4). In the meantime, they do have several goals to expand the model to work with ecological, eDNA and other types of data in 2024.

Jegelewicz commented 8 months ago

SO - we are mapping to DwC-A and any extensions we would like to send.

Darwin Core Archive Assistant, User Guide GBIF Registered Extensions

Jegelewicz commented 8 months ago

@tucotuco Thanks!

Nicole-Ridgwell-NMMNHS commented 8 months ago

I'd like to review how we're sending geology.

Jegelewicz commented 8 months ago

@dustymc Is there a separate mapping for media information?

dustymc commented 8 months ago

https://github.com/ArctosDB/PG_DDL/blob/master/shared_data/dwc_media.sql

Jegelewicz commented 8 months ago

AWG Member,

The first Darwin Core Mapping Workshop was held on February 12, 2024 but we still have a way to go. The AWG would like to have a second focused workshop to continue review of the Arctos mapping to Darwin Core. If you are interested in participating, please add your availability in this When2Meet by Friday, February 16th and remember to try for two hour blocks.

The focused Github Issue is #7348

Thank you!

Teresa J. Mayfield-Meyer

dustymc commented 7 months ago

Can we merge them all into one big-picture actionable remap doc

I hope there's some plan to do this?!

Jegelewicz commented 7 months ago

I think it will be easier for the community to review them one by one. Once we have them all settled, we can combine.

Nicole-Ridgwell-NMMNHS commented 2 months ago

The mapping issues that were closed, will these still be addressed?

dustymc commented 2 months ago

issues that were closed, will these still be addressed?

PLEASE! Just comment here and I'll adjust the map document. (Or I can allow comments to the map doc? I'm generating SQL from it so someone changing the functional columns could have an outsized impact. I'm up for whatever, but not smart enough to address one SQL statement from 50 issues!)

dustymc commented 2 months ago

I fixed some problems with locality attributes in the DWC map. @Nicole-Ridgwell-NMMNHS here's some sample data.

temp_dwc_sample_NMMNHPaleo.csv.zip

Nicole-Ridgwell-NMMNHS commented 2 months ago

Lithostratigraphy looks good in the sample data.