gbv / jskos-server

Web service to access JSKOS data
https://coli-conc.gbv.de/api/
MIT License
6 stars 4 forks source link

Validation endpoint? #156

Closed nichtich closed 2 years ago

nichtich commented 2 years ago

Maybe jskos-server might be the right place for https://github.com/gbv/jskos/issues/105:

Add POST /validate and GET /validate?url= where you can send a JSKOS document (single object or array of objects) to be validated. Optional parameter

Validation could either be agnostic to the current database content (just use jskos-validate library >=0.5.1 with option rememberSchemes unless specific type is specified) or it could be a dry run of import script. The latter would also check whether concepts and/or mappings match to an existing vocabulary in the database.

stefandesu commented 2 years ago

We don't have a fixed return format for this yet, right? I think it would be good to know not only if the object is valid JSKOS, but also whether it can be important (and the reason if it can't). For an array of objects, it would also be good to have that result for each of the objects.

nichtich commented 2 years ago

We don't have a fixed return format for this yet, right?

We can just pass errors of jskos-validate. This should do:

const validate = require("jskos-validate")
const { guessObjectType } = require("jskos-tools")

// additional parameters (optional)
const unknownFields = params.ignoreUnknownFields
const type = (guessObjectType(params.type, true) || "").toLowerCase()

const rememberSchemes = type ? [] : null
const validator = type ? validate[type] : validate

const result = input.map(data => {
  const result = validator(data, { unknownFields, rememberSchemes })
  return result ? true : validator.errors
})
nichtich commented 2 years ago

To include information about concept schemes stored in jskos-server, the rememberSchemes array has to be set to an array of all these concept schemes. This should be enabled via an query argument. So we have three optional arguments:

Are there other checks when importing data, e.g. detection of circular narrower links, duplicate URIs/identifier etc.?

stefandesu commented 2 years ago

So, the first implementation is in Dev.

I still don't fully understand how rememberSchemes would work in this context. If I look at your example code above, rememberSchemes is only given when the type is given. But if I understand correctly, this only makes sense if the validate is called twice: First with type scheme and rememberSchemes as an empty array, validating the scheme, then a second time with type concept and the rememberSchemes that now includes the validated scheme (if successful). This is impossible to perform in an HTTP endpoint. Does rememberSchemes even have a use here? (I first thought that you would give the validation function an array that first includes the scheme and then the concepts, but when I look at the code for jskos-validate, this would not work because one call to it can't validate multiple types of objects.)

Are there other checks when importing data, e.g. detection of circular narrower links, duplicate URIs/identifier etc.?

No detection of circular narrower links or anything like that. Duplicate URIs/identifiers are also not detected directly, but trying to POST an existing object will return an error, except when bulk importing. I'm also unsure why this would be relevant for this issue. I thought we just want to check whether a JSKOS object is valid or not.

nichtich commented 2 years ago

I've updated the code and it works like expected. For instance:

[
  {
    "type": ["http://www.w3.org/2004/02/skos/core#ConceptScheme"],
    "uri": "http://example.org/voc",
    "notationPattern": "[a-z]+"
  },
  {
    "type": ["http://www.w3.org/2004/02/skos/core#Concept"],
    "uri": "http://example.org/1",
    "notation": ["abc"],
    "inScheme": [{"uri": "http://example.org/voc"}]
  },
  {
    "type": ["http://www.w3.org/2004/02/skos/core#Concept"],
    "uri": "http://example.org/2",
    "notation": ["123"],
    "inScheme": [{"uri": "http://example.org/voc"}]
  }
]

results in

[
  true,
  true,
  [
    {
      "message": "concept notation 123 does not match [a-z]+"
    }
  ]
]

With type=concept the first is invalid but the third is valid because the notation is not checked.

question: should we better return false instead of true on success so truthy result elements indicate errors?

I thought we just want to check whether a JSKOS object is valid or not.

validation endpoint could be used as dry-run before import so additional integrity constraints of import should optionally be enforced on validation as well. But this is not the case and if so, another issue.

stefandesu commented 2 years ago

question: should we better return false instead of true on success so truthy result elements indicate errors?

That's a good point. I'm a bit split on this issue because on the one hand, I agree that it would make things slightly easier to check, but on the other hand, false only makes sense if you reverse the logic, i.e. we don't ask if it's valid, we ask if there are errors. Also, I don't think it would be too bad if you have to do a strict check for true. That makes things safer anyway.

validation endpoint could be used as dry-run before import so additional integrity constraints of import should optionally be enforced on validation as well. But this is not the case and if so, another issue.

Yeah, I'm still not sure whether that makes sense, so please make a separate issue that's of lower priority.


I guess I'll now finish the rest of the tasks, in particular tests and documentation.

stefandesu commented 2 years ago

I added the documentation with same example calls. @nichtich Could you please check the documentation to make sure there were no misunderstandings? I tried to explain how things like knownSchemes and rememberSchemes work, so it might not be 100% correct.

I will add tests after you checked it because it might change things.

nichtich commented 2 years ago

Ok, I've completed the documentation. What's also missing is inclusion of validation endpoint at /status and at the HTML view at /.

stefandesu commented 2 years ago

I will add some more tests tomorrow, in particular those that use the parameters, but after that I think this is finished. 👍