CLARIAH / clariah-plus

This is the project planning repository for the CLARIAH-PLUS project. It groups all technical documents and discussions pertaining to CLARIAH-PLUS in a central place and should facilitate findability, transparency and project planning, for the project as a whole.
9 stars 6 forks source link

Add B&G datasets to the NDE dataset register #96

Closed wmelder closed 1 year ago

wmelder commented 2 years ago

Provide the relevant datasets from the B&G data catalog to the NDE dataset register using the dataset API. This way Clariah+ doesn't need to provide custom method to import the B&G datasets information. See issue #56

menzowindhouwer commented 2 years ago

Yes, CL+ will harvest B&G via the NDE triplestore.

wmelder commented 2 years ago
wmelder commented 2 years ago

@menzowindhouwer: datasets can be found here this page provides a SPARQL query and link to the triple store)%20%0A%20%20%3Fdataset%20dct:publisher%20%3Chttps:%2F%2Fwww.beeldengeluid.nl%3E%20.%0A%7D), but that didn't seem to work (yet). Maybe some in between steps (transformation to DCAT) are not finished yet?

ddeboer commented 2 years ago

@wmelder Nice! Good to have the BenG datasets in the Register! 👍

this page provides a SPARQL query and link to the triple store)%20%0A%20%20%3Fdataset%20dct:publisher%20%3Chttps:%2F%2Fwww.beeldengeluid.nl%3E%20.%0A%7D), but that didn't seem to work (yet).

I see 9 datasets now. Did you expect more?

For example http://data.beeldengeluid.nl/id/dataset/0026 has another publisher (Muziekweb) so is not in that list.

wmelder commented 2 years ago

@ddeboer I still see no results in the triple store. And 9 results in the search.

I expected to see 11 datasets, because that was the number of datasets that was registered.

But indeed, Muziekweb has publisher https://www.muziekweb.nl/ The other set that is missing is Natuurbeelden that has publisher https://natuurbeelden.openbeelden.nl/. Both datasets also have B&G as publisher. Maybe that is a problem?

ddeboer commented 2 years ago

@ddeboer I still see no results in the triple store.

Did you click the ‘Run’ button in the web interface?

But indeed, Muziekweb has publisher https://www.muziekweb.nl/ The other set that is missing is Natuurbeelden that has publisher https://natuurbeelden.openbeelden.nl/. Both datasets also have B&G as publisher. Maybe that is a problem?

Not a problem as such, but the Requirements for Datasets only allow a single schema:publisher. This is on purpose: usually it’s really a single organisation that takes care of publishing the dataset. Apparently the SPARQL query happened to pick out the organisation other than BenG in these two cases. After all, RDF guarantees no order for collections.

You can indicate other collaborating parties with schema:creator. Would that solve your problem?

wmelder commented 2 years ago

Did you click the ‘Run’ button in the web interface?

Yes, but it keeps telling me that there are no results.

Not a problem as such, but the Requirements for Datasets only allow a single schema:publisher. This is on purpose: usually it’s really a single organisation that takes care of publishing the dataset. Apparently the SPARQL query happened to pick out the organisation other than BenG in these two cases. After all, RDF guarantees no order for collections.

O indeed. I missed that. So I will mark this as an additional test before publishing. It didn't came out of the SHACL API validation by the way.

You can indicate other collaborating parties with schema:creator. Would that solve your problem?

That's a good suggestion indeed.

wmelder commented 2 years ago

When I re-register these datasets via the API (POST), will the datasets be overwritten?

ddeboer commented 2 years ago

O indeed. I missed that. So I will mark this as an additional test before publishing. It didn't came out of the SHACL API validation by the way.

When I query the dataset manually, the RDF contains only a single publisher:

comunica-sparql  https://data.beeldengeluid.nl/id/dataset/0026  'select * {?s ?p ?o} limit 1000'
[
{"s":"http://data.beeldengeluid.nl/id/dataset/0026","p":"https://schema.org/description","o":"\"Muziekweb is de muziekbibliotheek van Nederland. Ons doel is om muziek en de informatie over muziek voor iedereen laagdrempelig aan te bieden. Muziekweb heeft sinds 1961 een collectie opgebouwd van 600.000 cd’s, 300.000 lp’s en 30.000 muziek-dvd’s (dat is samen goed voor zo'n zeven miljoen tracks!). De site is een bron van informatie over muziek die de afgelopen vijftig jaar in Nederland is uitgebracht, en een ideale plek om meer muziek te leren kennen. Deze dataset beschikt over de laatste linked data versie van het muziekweb. Hier vind je een linked data view waarin alle albums van het muziekweb zijn weergegeven.\"@nl"},
{"s":"http://data.beeldengeluid.nl/id/dataset/0026","p":"https://schema.org/distribution","o":"http://data.beeldengeluid.nl/id/datadownload/0025"},
{"s":"http://data.beeldengeluid.nl/id/dataset/0026","p":"https://schema.org/inLanguage","o":"\"nl-NL\""},
{"s":"http://data.beeldengeluid.nl/id/dataset/0026","p":"https://schema.org/includedInDataCatalog","o":"http://data.beeldengeluid.nl/id/datacatalog/0001"},
{"s":"http://data.beeldengeluid.nl/id/dataset/0026","p":"https://schema.org/license","o":"https://opendatacommons.org/licenses/by/"},
{"s":"http://data.beeldengeluid.nl/id/dataset/0026","p":"https://schema.org/mainEntityOfPage","o":"https://data.muziekweb.nl/"},
{"s":"http://data.beeldengeluid.nl/id/dataset/0026","p":"https://schema.org/name","o":"\"Muziekweb\"@nl"},
{"s":"http://data.beeldengeluid.nl/id/dataset/0026","p":"https://schema.org/publisher","o":"https://www.muziekweb.nl/"},
{"s":"http://data.beeldengeluid.nl/id/dataset/0026","p":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type","o":"https://schema.org/Dataset"},
{"s":"http://data.beeldengeluid.nl/id/datadownload/0025","p":"https://schema.org/description","o":"\"Bevraag Muziekweb via SPARQL.\"@nl"},
{"s":"http://data.beeldengeluid.nl/id/datadownload/0025","p":"https://schema.org/inLanguage","o":"\"nl-NL\""},
{"s":"http://data.beeldengeluid.nl/id/datadownload/0025","p":"https://schema.org/license","o":"https://opendatacommons.org/licenses/by/"},
{"s":"http://data.beeldengeluid.nl/id/datadownload/0025","p":"https://schema.org/name","o":"\"Muziekweb als linked data\"@nl"},
{"s":"http://data.beeldengeluid.nl/id/datadownload/0025","p":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type","o":"https://schema.org/DataDownload"},
{"s":"http://data.beeldengeluid.nl/id/datadownload/0025","p":"https://schema.org/contentSize","o":"\"930000\""},
{"s":"http://data.beeldengeluid.nl/id/datadownload/0025","p":"https://schema.org/contentUrl","o":"https://api.data.muziekweb.nl/datasets/MuziekwebOrganization/Muziekweb/services/Muziekweb/sparql"},
{"s":"http://data.beeldengeluid.nl/id/datadownload/0025","p":"https://schema.org/encodingFormat","o":"\"application/sparql-query\""},
{"s":"http://data.beeldengeluid.nl/id/datadownload/0025","p":"https://schema.org/usageInfo","o":"https://www.w3.org/TR/rdf-sparql-query/"},
{"s":"https://www.muziekweb.nl/","p":"https://schema.org/name","o":"\"Muziekweb\"@nl"},
{"s":"https://www.muziekweb.nl/","p":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type","o":"https://schema.org/Organization"},
{"s":"https://www.muziekweb.nl/","p":"https://schema.org/sameAs","o":"http://www.wikidata.org/entity/Q18088607"}
]

When I re-register these datasets via the API (POST), will the datasets be overwritten?

Yep. Or wait a day and they should be crawled again. 😄

wmelder commented 1 year ago

All datasets have b&g as publisher now