CLARIAH / clariah-plus

This is the project planning repository for the CLARIAH-PLUS project. It groups all technical documents and discussions pertaining to CLARIAH-PLUS in a central place and should facilitate findability, transparency and project planning, for the project as a whole.
9 stars 6 forks source link

Take inventory of metadata from CLARIAH centers/partners #47

Open ddeboer opened 2 years ago

ddeboer commented 2 years ago

Make an inventory of metadata formats supplied by CLARIAH centers/partners used to declare dataset descriptions.

  1. which format
  2. where (endpoint)
  3. how (protocol)

Inventory

NDE

[Note: info below comes from interviews by Femmy with various CLARIAH partners. It gives a broad overview, but details still need to be added]

KB: via verschillende endpoints worden plukjes uitgeleverd, vaak in context van Europeana, LOD beschikbaar gesteld via http://data.bibliotheken.nl/ (OAI-PMH). Andere data die beschikbaar worden gesteld via verschillende services zijn te vinden op https://www.kb.nl/bronnen-zoekwijzers/dataservices-en-apis (ook vaak OAI-PMH, of via Wikicommons).

IvdNT: stelt taalmaterialen ter beschikking via https://taalmaterialen.ivdnt.org/. Metadata open beschikbaar via endpoint (?), OAI-PMH.

IISG: metadata altijd open, API levert metadata uit, OAI-MPH, wordt ook aan Europeana/WorldCat uitgeleverd

MPI: endpoint waar metadata open beschikbaar worden gesteld via OAI-PMH.

Meertens: deel van de metadata kan worden geharvest, via OAI-PMH

Huygens: een aantal datasystemen hebben een eigen API, oudere software omgevingen werken met Datadumps (http://oaipmh.huygens.knaw.nl/oai).

Beeld & Geluid: Verschilt per collectie. Metadata van‘Open beelden’ (CC0) wordt geharvest o.a. door Europeana, ‘Open Data’ heeft endpoint met CC0 metadata.

DANS: metadata altijd open via OAIMPH endpoint en ook te harvesten via DataCite (http://easy.dans.knaw.nl/oai/?verb=Identify).

(…)

wmelder commented 2 years ago

The Beng open data catalog is here: https://data.beeldengeluid.nl/id/datacatalog/0001 So it contains a description of our datasets in schema.org.

menzowindhouwer commented 2 years ago

@femmynine and I started working on an inventory spreadsheet: https://docs.google.com/spreadsheets/d/1CJLfI-7nC5JazNMoFnVKea45FXIWELHCZOfniQqOIgc/edit?usp=sharing (contact me if you need write access)

menzowindhouwer commented 2 years ago

Added OAI endpoints to a harvester config: https://github.com/CLARIAH/harvest-config ... first trial harvest is running now ...

EnnoMeijers commented 2 years ago

KB aims to add descriptions of all its datasets to the NDE-Datasetregister, so it will not stay limited to the current LOD datasets.

menzowindhouwer commented 2 years ago

Added dummy configs for NDE and SPARQL to https://github.com/CLARIAH/harvest-config

ddeboer commented 2 years ago

@menzowindhouwer Cool! Dummy in what way, are you still looking for feedback on the SPARQL query?

menzowindhouwer commented 2 years ago

dummy in the sense that we can't parse/execute them yet in the harvester ...