data-fair / data-fair

Findable, Accessible, Interoperable and Reusable Data. A complete open-source solution for your open and private data needs. French only for the time being, internationalization coming soon.
https://data-fair.github.io/3/
GNU Affero General Public License v3.0
33 stars 9 forks source link

How to harvest Data Fair catalog? #163

Open croesus opened 1 year ago

croesus commented 1 year ago

I'm the product manager for https://opennetzero.org - a search engine for net-zero datasets. Our platform uses CKAN. We'd like to be able to harvest https://data.ademe.fr/ (which uses Data Fair) and add it to our index but can't find an appropriate endpoint. The documentation mentions harvesting via the API but doesn't give any details how to do this.

Is there a DCAT endpoint published by Data Fair? If not, how would you suggest we index the catalogue? Thanks!

nicolas-bonnel commented 1 year ago

Hi,

You can access documentation here : https://data.ademe.fr/openapi-viewer/?url=https://data.ademe.fr/data-fair/api/v1/api-docs.json&proxy=false , but documentation does not replace host and there are CORS errors.

The request for the catalog is : https://data.ademe.fr/data-fair/api/v1/datasets?size=20&page=1&owner=organization:g1pKfMqaE&publicationSites=data-fair-portals:efWMeL1ZP&visibility=public

You can increase size and get the whole catalog with one request.

The portal https://data.ademe.fr/ use the API, you can toggle the dev console in your browser and see HTTP requests when you browse the catalog.

croesus commented 1 year ago

Thank you! We'll investigate further.

Indexing a proprietary API, even a simple one, requires work specific to that platform and will delay it being included in our catalog. Please would you consider publishing a DCAT/DCAT-AP catalog as a feature of Data Fair? DCAT-AP has been adopted by the EU as the data catalog standard for public sector data portals and it makes it easier to build additional services (like ours) on top.

Thanks again for your response, and your work in helping organisations manage and publish their data.

nicolas-bonnel commented 1 year ago

Yes, publishing a DCAT catalog is on our roadmap. We'd like to be compatible with France national data portal and it makes senses to be harvestable by CKAN.

However I don't have a date for this feature.