Open amercader opened 9 years ago
@brew, @tryggvib as per our chat before ^
@brew Your validation branch looks really good.
http://sweden.staging.ckanhosted.com/api/action/dcat_validation?id=test_organization_1
Some stuff to finish it off:
To make the output more useful and closer to the actual validation output can we update it to return the following?
{
"url": "harvest source url",
"last_validation": "harvest_job.gather_finished",
"result": {
"errors": "see below",
"warnings": "see below",
"resources": ["list of errors as we have now"]
}
}
I think all of these are available without extra queries. For the error and warning counts, just add them if they are easy to get from the gather errors, if not don't bother for now.
dcat_validation
key on the dcat_organization_list
action, pointing to the new endpoint you created.
Context
On the Swedish open data portal datasets are harvested from DCAT metadata dumps like this one. This is parsed by the ckanext-dcat harvester and CKAN datasets are created.
There is a CKAN organization and a CKAN harvest source for each remote organization that has its datasets imported into CKAN.
The DCAT files are validated using an external validation service:
https://validator.dcat-editor.com/
This service only supports POST requests. For example, called with the DCAT file linked before it returns this output.
We are hooking up with the validation service at this point:
https://github.com/okfn/ckanext-sweden/blob/master/ckanext/sweden/dcat/plugin.py#L21
This is called after the remote file is downloaded and before the contents are parsed and datasets created. Note that we are returning an array with validation errors. These are stored as harvest errors, more specifically GatherErrors, linked to a Harvest Job (which is linked to a Harvest Source, linked to an Organization). For instance, these errors are displayed in the harvest report page).
What's needed
On the custom
dcat_organization_list
action we need adcat_validation
key in the with the valuehttp://{host}/organization/{id}/dcat_validation
This endpoint should point to a custom action that returns the validation errors for the last harvest done for this organization (more precisely, errors occurred during the last harvest job of the organization harvest source).
The actual output can be: