Open up/extend the OAI harvester

CLARIAH / clariah-plus

This is the project planning repository for the CLARIAH-PLUS project. It groups all technical documents and discussions pertaining to CLARIAH-PLUS in a central place and should facilitate findability, transparency and project planning, for the project as a whole.

9 stars 6 forks source link

Open up/extend the OAI harvester #56

Open menzowindhouwer opened 2 years ago

menzowindhouwer commented 2 years ago

Open up/extend OAI harvester:

To speak to the CLARIAH endpoint registr(y|ies)
To speak the needed protocols
To deal with the needed formats

ddeboer commented 2 years ago

@menzowindhouwer I heard you mention that you want to connect the harvester to the NDE Register.

I’m not seeing a separate issue for that, so assuming that you referred to this one. Let me know if you and @vicding-mi need any help with building the right SPARQL queries to retrieve dataset descriptions from the NDE Register SPARQL endpoint.

wmelder commented 2 years ago

How does the NDE register handle updates? Does it also do regular updates from the sources it has registered? Would Clariah then also have to run some SPARQL queries regularly to stay up-to-date? Or is there some kind of queue where changes are pushed?

menzowindhouwer commented 2 years ago

@vicding-mi and I are preparing for a pull like approach, i.e. CLARIAH+ regularly reruns the queries at the NDE endpoint ...

ddeboer commented 2 years ago

How does the NDE register handle updates? Does it also do regular updates from the sources it has registered?

Yep: the NDE Register periodically crawls registered datasets, so updates to the dataset descriptions will end up (eventually) in CLARIAH+ as well.

wmelder commented 2 years ago

Yep: the NDE Register periodically crawls registered datasets, so updates to the dataset descriptions will end up (eventually) in CLARIAH+ as well.

In that case we probably better add the B&G datasets to the NDE register using the API, so that no additional Clariah harvester/crawler is needed for B&G?

ddeboer commented 2 years ago

@vicding-mi @menzowindhouwer I created #97 for information specific to the NDE Dataset Register.