gbv / jskos-server

Web service to access JSKOS data
https://coli-conc.gbv.de/api/
MIT License
6 stars 4 forks source link

Add /mappings/infer endpoint #177

Closed nichtich closed 1 year ago

nichtich commented 2 years ago

This requirements is originated by coli-rich and inclusion of mappings as recommendations in DA-3. The API would be similar to /mappings with:

Response

Request

If the corresponding request to /mappings would return a non-empty list of mappings, the result should be the the same. If the result would be empty (no mapping found), however, the /mappings/infer endpoint applies additional logic to infer matching mappings from existing mappings.

The inference logic is outlined at https://github.com/gbv/jskos-server/issues/81#issuecomment-651658837 but needs some review.

A later extension of this endpoint could also support #35.

stefandesu commented 2 years ago

Is there a particular reason to make this a separate endpoint instead of an additional argument for /mappings as described in the comment you linked? I'm sure we talked about it, but I couldn't find it documented anywhere.

Also this feature requires access to the hierarchy of a vocabulary. Since we only need a mapping's ancestors, we could just query the /ancestors endpoint of the API given in the API field.

nichtich commented 2 years ago

Is there a particular reason to make this a separate endpoint

I'd prefer to not overload the already complex /mappings endpoint with additional functionality (here: infer additional mappings) but keep it limited to those mappings actually stored in the database.

Also this feature requires access to the hierarchy of a vocabulary. Since we only need a mapping's ancestors, we could just query the /ancestors endpoint of the API given in the API field.

Great!

stefandesu commented 2 years ago

I actually think this wouldn't be too difficult to implement:

  1. Call internal method for /mappings with parameters; if there are results, return those.
  2. Get ancestors for from concept (via API field of scheme)
  3. Loop through ancestors and stop when there are results for a particular ancestor
  4. Construct new mappings according to https://github.com/gbv/jskos-server/issues/81#issuecomment-651658837

Am I missing something?

It would be good to also have the following information:

nichtich commented 2 years ago

If a mapping has a URI, it is an actual mapping, so not inference was necessary. This should be enough.

yes

If a mapping was inferred, what is the source mapping? [...] maybe JSKOS field source?

Good idea to use source = [ { uri: sourceMappingUri } ].

Am I missing something?

Side note: In contrast to attempts of coli-rich, the algorithm does not tell whether the precision could be improved by intellectual creation of an additional mapping: If I get a narrowMatch to concept X, there might be the chance of a more specific exactMatch to concept Y but then Y would likely be descendant of X. So if X has no children, the best we could get intellectually is an exactMatch to X. If X has children, the recommended X is not wrong but maybe not precise enough for catalogers. As inferred mappings as primarily used as recommendation this is no problem but automatic enrichment requires an additional step.

nichtich commented 2 years ago

Handling mapping types with this issue is a bit complex so let's recap:

If passing the full query to /mappings gives a non-empty results, just return this result. Otherwise:

stefandesu commented 2 years ago

First implementation now in branch issue-177.

If mappings are inferred, they are now transformed in the following way:

Is there anything I'm missing @nichtich ?

Also one last question: In which cases should an error be thrown instead of simply returning an empty result set?

nichtich commented 2 years ago
  • Mapping type broadMatch was requested
  • fromScheme not supported (no access to API/ancestors)

empty result

  • to parameter is given (currently it is simply ignored which is also not a good solution)

Request error 40X

  • API for fromScheme is supported, but there was an error accessing it (and therefore no ancestors could be retrieved)

Server error 50x

stefandesu commented 2 years ago

Still missing:

stefandesu commented 1 year ago

Okay, from my side, this feature is pretty much finished. I've added tests to cover most cases (probably not all code paths though) and I've tested it as a mapping recommendation provider in Cocoda (only necessary change was that Cocoda needed to add the fromScheme parameter to the request which it's not doing atm; not sure if we actually need this endpoint for mappings in Cocoda).

I did notice a few things though and I need your feedback @nichtich:

  1. If strict is given, but the type is not restricted, currently mappings of type closeMatch are ignored. I think they can be included, but in this particular case the mapping type should be kept as closeMatch and not changed to narrowMatch. What do you think?
  2. Currently, this is only possible in one direction, i.e. from a certain concept. We could also adjust it so that either from or to can be given, but not both. Also there's the parameter direction for /mappings and I'm not totally sure what its implications are for inferring mappings.
  3. If certain filtering parameters like creator are given and an inferred mapping is returned, it is not possible to see that the inferred mapping is actually based on a mapping by that creator. Should this be encoded in the mapping somehow (i.e. by adding creator to mapping.source[0]) or is it okay that it's not visible in the inferred mapping?
nichtich commented 1 year ago
  1. As soon as inference via hierarchy is applied we cannot guarantee that closeMatch still applies. The more going up in the hierarchy, the less close the inferred relation is (e.g chemistry is close to alchemy but computational chemistry has nothing to do withit)
  2. Addition of to and direction would only make things more complicated for little benefit.
  3. No, giving the mapping uri as source is enough.
stefandesu commented 1 year ago

1. As soon as inference via hierarchy is applied we cannot guarantee that closeMatch still applies. The more going up in the hierarchy, the less close the inferred relation is (e.g chemistry is close to alchemy but computational chemistry has nothing to do withit)

That makes sense. However, it also applies to mappingRelation and relatedMatch where we currently apply that rule (i.e. mapping type stays the same).

2. Addition of to and direction would only make things more complicated for little benefit.

The main benefit would be that it would be easier to add to Cocoda IF we were planning to do that. But if that's not the case, we can leave it as is for now. Should we disallow the direction parameter as well then to avoid unexpected results since it's passed to the underlying mapping request?

3. No, giving the mapping uri as source is enough.

👍

nichtich commented 1 year ago

However, it also applies to mappingRelation and relatedMatch where we currently apply that rule (i.e. mapping type stays the same).

Yes but less so: less related topics are still "somehow" related. Whether to include these can be discussed and likely depends on vocabulary. I guess a limit on hierarchical traversal (e.g. only one or two step of ancestors) would be better. By now this is more an academic question on edge cases, so let's keep it as it is for now.