Informatievlaanderen / vodap

Het Vlaams Open Data Portaal is een CKAN instantie.
MIT License
1 stars 0 forks source link

Issues with circle-ci job for harvesting geoserver’s DCAT-AP #66

Open pietercolpaert opened 9 months ago

pietercolpaert commented 9 months ago

On the one hand, there are various issues with the Geoserver’s DCAT-AP export: e.g., they use non-existing hydra IRIs for the pagination, they have some prefixes undefined (leading to broken IRIs), the pages take very long to respond - they even sometimes time out, and the distributions are blank nodes instead of given an IRI. Can we get in contact with Geoserver to solve that? Is there a reference to their code that generates this to maybe do a pull request there?

On the other hand, the scripts provided in this repository at https://github.com/Informatievlaanderen/vodap/blob/master/.circleci/config.yml and https://github.com/Informatievlaanderen/vodap/blob/master/scripts/download.sh also contain an important mistake: the parser re-starts numbering blank nodes on every page, so the blank node numbering across pages will conflict.

I’ve writen a small nodejs script that does number blank nodes correctly over here: https://github.com/pietercolpaert/DCAT-AP-Dumps-To-Feeds/blob/main/bin/helperFlanders.ts - feel free to reuse it here.

Needs input from @bertvannuffelen