GovDataOfficial / DCAT-AP.de-SHACL-Validation

SHACL-Shapes für DCAT-AP.de
https://www.itb.ec.europa.eu/shacl/dcat-ap.de/upload
GNU Affero General Public License v3.0
10 stars 7 forks source link

How to invoke validator from application? #13

Closed ondics closed 2 years ago

ondics commented 2 years ago

We use CKAN with the dcat-ap-de extension.

We'd like to support a dataset editor directly to perform DCAT-AP.de validation.

Therefore we need to invoke the validator using a single button inside CKAN which shows validator output for the dataset being edited in eith a new browser tab or in an embedded iframe.

Is it possible to invoke the validator with a GET request?

Which parameters are required to call the validation process directly?

benjaminaaron commented 2 years ago

Is it possible to invoke the validator with a GET request?

I worked on this some time ago and remember experimenting with the REST API. Here is the documentation for it. It's also possible to define what to validate via SPARQL, see here. You can also set up a local docker instance to validate against.

ondics commented 2 years ago

Thank you @benjaminaaron but the results are presented for the editor. So a user interface is helpful an not only JSON output as the REST API provides.

And we want to use https://www.itb.ec.europa.eu/shacl/dcat-ap.de/upload to avoid setting up a private validator. Why should we do this if the validator is publicly available?

benjaminaaron commented 2 years ago

I can't speak to the question of how to best integrate it in your editor. Here is a short piece of JavaScript code I used to validate a GovData dataset against the publicly available validator, maybe it is useful. As a result it writes out a report.rdf file. To run it, you need npm i axios --save first and then node script.js.

// script.js
const axios = require('axios')
const fs = require('fs');

axios
   .post('https://www.itb.ec.europa.eu/shacl/dcat-ap.de/api/validate', {
       contentToValidate: "https://www.govdata.de/ckan/dataset/geologische-ubersichtskarte-der-bundesrepublik-deutschland-1-200-000-guk200-cc-7934-munchen.rdf",
       validationType: "all",
       contentSyntax: "application/rdf+xml"
   })
   .then(res => {
       console.log('data received');
       fs.writeFile('report.rdf', res.data, err => {
           if (err) { return console.log('error writing file:', err);}
           else { console.log('saved to file'); }
       });
   })
   .catch(error => { console.error(error) });
ondics commented 2 years ago

Thank you @benjaminaaron

Is there also a simple HTML-button-solution where we can pack all parameters in the URL?

costas80 commented 2 years ago

Hi @ondics , I'm the team leader responsible for the maintenance of the validator software.

If I understood correctly you want to integrate the validator in an existing application but not via the validator's REST or SOAP API as @benjaminaaron has explained.

From your last comment I think you are referring to collecting input parameters (from your own app) and submitting them to the validator right? Is the idea to e.g. include the validator in an iframe and use it to display the validation results following your submission?

ondics commented 2 years ago

Hi @costas80, yes exactely. We are going to provide a button for a dataset author labeled Perform DCAT-AT.de validation which starts the online validator https://www.itb.ec.europa.eu/shacl/dcat-ap.de/upload.

But the dataset author should not enter form data for the currently edited dataset in the validator form. The data should be submitted by the button (e.g. most easily as GET parameters) and then the validation results show up in an iframe or a new browser tab.

We like to integrate this validation in the authoring process: Before a dataset author finishes dataset editing the validation should give 0 errors. In a later stage, we think about making this condition mandatory, so a non compliant dataset cannot be saved at all.

costas80 commented 2 years ago

Thanks for explaining @ondics . Interestingly we have never had this kind of request before. Either users use the validator as-is via its UI, include the full validator in an iframe, or integrate it via REST API and then present the results through their own UI. In fact what you would like is to skip the "input" part of the UI altogether and go only for the results' display (which I assume is why you requested #14 as well).

Thinking out of the box here I'm considering the following:

  1. You trigger the validation via the validator's REST API. This is the easiest public API to use for such integrations rather than replicating a UI form submission (also the endpoint we expose for UI form submissions is not necessarily maintained to be stable for third parties).
  2. In the request you specify "text/html" as the syntax of the validation report (using an input or the Accept header - as is currently the case). Normally for the REST API this syntax input is expected to be an RDF syntax (e.g. "application/rdf+xml") for the resulting report.
  3. On the validator's side we will then return as a response the full HTML/JS/CSS content for a result display. Note that in the REST API we can also include an additional input to allow you to specify whether you prefer the full or minimal UI to be rendered with the response (for your use case I think the minimal UI would be more meaningful).

Is this an approach that would work for you?

ondics commented 2 years ago

Our aim is to minimize the implementation effort to integrate the validator.

to 1. : When receiving the results as JSON we hat to build the entire presentation by ourself. This is way to much effort on our side and we depend on evolving JSON schemas / contents.

to 2.: You mean a POST request from our button supplying the form parameters? This is possible, although a GET request would be easier. We know and take into account that "the endpoint we expose for UI form submissions is not necessarily maintained to be stable for third parties"

to 3.: It would be sufficient if the validator returns the full ui with all possible formats (html, pdf, csv...).

Our users need to with the validator results: So for multiple datasets they mostly will import the csv to excel an make an workflow from it. for single datasets they will go with the HTML output.

costas80 commented 2 years ago

Thanks for explaining. To be clear, the three points I mentioned above were steps of the same solution, not different alternatives. In brief, the solution I envisage is that you use the REST API, specifying in the provided input that you expect the result in HTML, in which case we return the full HTML UI output with all relevant controls (not JSON) that you can target to an iframe. Essentially you would receive the HTML for this:

image

Normally when using the REST API you receive the SHACL validation report in an RDF syntax, however the proposal here is that if you request HTML you get the HTML content for the full UI (from which you can then filter results, produce PDF/CSV/RDF reports etc.).

On the GET vs POST to trigger this I would remain in favour of POST given that you may be submitting a potentially large payload. In any case from an implementation perspective this should be more/less equivalent for you no? Is there a reason why you prefer GET?

ondics commented 2 years ago

We prefer GET because it's easier to integrate in a link button.

Thank you for your JSON explanation. We'tt try this.

Where are the REST API docs? Which parameters are expected?

costas80 commented 2 years ago

The REST API is documented in the user guide with the Swagger UI here.

Keep in mind that this documentation is generic, not specific to the DCAT-AP.de validator. In the case of the DCAT-AP.de validator you would validate as follows:

Retrieve the available validation types (what you select in the "profile" dropdown):

(note: you don't need to do this before validating of course, just use it to see what are the available profiles/types)

HTTP GET to https://www.itb.ec.europa.eu/shacl/dcat-ap.de/api/info with no payload.

Make a validation for a given target profile:

HTTP POST to https://www.itb.ec.europa.eu/shacl/dcat-ap.de/api/validate with payload:

{
  "contentToValidate": "...",
  "contentSyntax": "text/turtle",
  "embeddingMethod": "STRING",
  "validationType": "all"
}

You have various options here such as passing URLs, SPARQL queries, content in BASE64 etc. Check the user guide for explanations and examples.

To make possible what you request you would include in your request's payload the reportSyntax parameter set to text/html (or the Accept header). For example:

{
  "contentToValidate": "...",
  "contentSyntax": "text/turtle",
  "embeddingMethod": "STRING",
  "validationType": "all",
  "reportSyntax": "text/html"
}

Important: As I mentioned, using the REST API to return the validator's HTML UI (with the results) is currently not supported. If you confirm that this approach would be what you need then we can start development on it.

costas80 commented 2 years ago

Hi @ondics ,

We just made available a new feature for the validator to cover your need to easily integrate it into an existing app. Note that we chose not to make the extension to the REST API to return HTML (as suggested earlier) as this was mixing up concerns too much. You still of course have the option of using the REST API as it currently stands (see here) but for this you'd need to develop the integration and handle the presentation yourselves.

So coming to the new feature, this allows you now to use the validator directly in an iframe within your current UI and have it only display validation results for input that your own UI submits to it. Doing so is quite simple as for a simple implementation you only need to adapt your HTML.

In the following sample, I create a basic web page with a form that includes a file input (note how the target of the form is set to point to the iframe which is initially blank):

<html>
    <head><title>Simple validator</title></head>
    <body>
        <h1>Validate your data</h1>
        <form method="POST" enctype="multipart/form-data" action="https://www.itb.ec.europa.eu/shacl/dcat-ap.de/uploadm" target="output">
            <input type="file" name="file">
            <input type="hidden" name="validationType" value="all">
            <button type="submit">Validate</button>
        </form>
        <iframe name="output" style="width:100%; height:50%;" src='about:blank'></iframe>
    </body>
</html>

Before validating the display would appear as follows:

image

Once you upload a file and validate, this results in the following display (note that no form inputs are displayed upon validation):

image

In the example I used a file upload, but you can also provide a URI or direct text content. The documentation on this, including which input parameters are expected, is provided in the validator's documentation.

The only thing that you would need to figure on your end is which type of validation (i.e. profile) you want to trigger. You can either provide this as a fixed value (as I do in the example) or replicate the dropdown list from the validator. The values for the different types to use can be found via the validator's REST API (see my earlier comment and the "info" operation). Currently the validation types are as follows (you would use the values from the type properties listed below):

{
  "domain": "dcat-ap.de",
  "validationTypes": [
    {
      "type": "v20_de_spec",
      "description": "DCAT-AP.de 2.0 - Specification (ALPHA)"
    },
    {
      "type": "v11_de_spec",
      "description": "DCAT-AP.de 1.1 - Specification"
    },
    {
      "type": "v210d_ap_man_v11_de_spec",
      "description": "DCAT-AP.de 1.1 Specification + DCAT-AP 2.1 Mandatory"
    },
    {
      "type": "v210d_ap_manrec_v11_de_spec",
      "description": "DCAT-AP.de 1.1 Specification + DCAT-AP 2.1 Mandatory & Recommended"
    },
    {
      "type": "v11_de_konv",
      "description": "DCAT-AP.de 1.1 GovData-Guidelines (1, 2, 4-12, 21, 30, 32)"
    },
    {
      "type": "all",
      "description": "DCAT-AP.de 1.1 Specification + GovData-Guidelines (s.o.) + DCAT-AP 2.1 Mandatory & Recommended"
    },
    {
      "type": "dashboard_alpha",
      "description": "GovData MQA (Alpha)"
    },
    {
      "type": "dashboard_live",
      "description": "GovData Quality Dashboard"
    }
  ]
}

I trust this covers your needs. If anything is missing please let me know.