gbif-norway / resolver-docker

A dockerised version of the GBIF.no resolver
Apache License 2.0
0 stars 2 forks source link

Resolver to resolve material sample ids separately to the occurrence ids #19

Open rukayaj opened 2 years ago

rukayaj commented 2 years ago

We have a problem with the way we're publishing data currently: it's not possible to separately identify material samples vs occurrences for most of our records.

I don't think the resolver should issue identifiers to the data records, I think that we should be publishing the identifiers and the resolver should be resolving them.

We do have separate material sample IDs for DNA datasets from Corema, so step 1 could be to make the resolver resolve those separately.

dagendresen commented 2 years ago

(1) I think that we CAN unambiguously publish (real) Occurrences separately from voucher specimens and tissue samples -- using the IPT and DwC-A by using basisOfRecord. (However, I also think that the GBIF data portal and Artskart do not present these appropriately/correctly).

(2) Agree! The resolver should not issue PIDs, only resolve them.

(3) Agree! The MaterialSamples with materialSampleIDs should be resolved by separate endpoints from the corresponding Occurrences they are linked to. The respective occurrenceID should here be an attribute of the MaterialSample endpoint metadata ...

dagendresen commented 2 years ago

I think we should also extract and resolve organismIDs, eventIDs, taxonIDs, etc (when these IDs are following a reasonable name string syntax that we can trust will be persistent ... TODO: decide of a test for the PID name syntax)

Notice also that there exists nowhere yet, for the Norwegian GBIF-datasets, except from the resolver we are building, any end-point (machine-readable or not) for occurrenceID, materialSampleID, organismID, ... etc.

(The global GBIF portal sort of almost provides something that resembles an end-point for data-records, but obviously not for any of the other real-life object classes ...).

The envisioned workflow is for the data publisher to mint (create) a persistent identifier - for their MaterialSamples Occurrences, etc., and add these to their DwCA datasets. The envisioned agreement is next for GBIF-Norway to create the end-point for these publisher-provided persistent identifiers. Currently we support urn:uuid:UUID type identifiers (including the PURL form), but exploring other (more robust?) identifier types such as Handles and DOIs would be perfect.

I think that establishing these end-points is the important rationale for the resolver :-)