ckan / ckanext-dcat

CKAN ♥ DCAT
164 stars 142 forks source link

does the harvester/dcat have the ability to parse an entire catalog with multiple datasets and their distributions at once. #227

Closed sparsons808 closed 1 year ago

sparsons808 commented 1 year ago

Looking for clarification on this. please advise, need to know if this is possible or if I need to build my own way to do it. a catalog.jsonld file that holds multiple datasets and distributions for those datasets can ckan ingest these files parse them with dcat and load them all at once.

metaodi commented 1 year ago

Yes this is how the dcat harvester works. You configure the catalog endpoint of another instance as the URL of the dcat harvester.

sparsons808 commented 1 year ago

so basically this is only compatible with other sites using ckan?

sparsons808 commented 1 year ago

so if I was trying to get a catalog from from another source I would say something like data.gov/catalog.jsonld?profile=euro_dcat_ap and this would grab data.govs catalog and load into my instance? @metaodi

metaodi commented 1 year ago

It's compatible with any source that provides valid DCAT RDF. I'm familiar with the setup in Switzerland where opendata.swiss has several harvesters setup from different sources. All of those sources provide valid DCAT, but only some of them are using CKAN. Others use OpenDataSoft or custom software to generate DCAT RDF.

You can actually check the config of the harvesters, e.g. here is one harvesting from another CKAN instance: https://ckan.opendata.swiss/api/3/action/harvest_source_show?id=9a2e4ae1-62b2-431d-913e-aeaa42988ec5 (or check the full list with https://ckan.opendata.swiss/api/3/action/harvest_source_list)

As you can see the source is set to https://data.stadt-zuerich.ch/catalog.xml?fq=-tags:ktzh (so this transfers all datasets except those with the tag ktzh)

sparsons808 commented 1 year ago

@metaodi what are you thoughts on catalog that have parent datasets with children datasets. I have had hard time parsing catalogs with datasets that have child datasets.

metaodi commented 1 year ago

Data product is afaik not defined by DCAT, but DCAT 2 has "qualified realation", so you could use those to link datasets together (e.g. using dct:partOf).

Depending on the use case the new DCAT 3 class DatasetSeries could be a solution, too.

sparsons808 commented 1 year ago

@metaodi thank you for this!