gbv / cocoda

A web-based tool for creating mappings between knowledge organization systems.
https://coli-conc.gbv.de/cocoda/
MIT License
39 stars 5 forks source link

Request concept schemes directly from coli-conc KOS registry in BARTOC #670

Closed stefandesu closed 2 years ago

stefandesu commented 2 years ago

There is a coli-conc KOS registry in BARTOC which can be used to query the list of schemes we need in Cocoda directly. Many of the schemes have a field API which includes details on how to access the concept data for that scheme. cocoda-sdk can already use that info and turn it into a registry object which can be used to access the data.

In theory, it shouldn't be difficult to adjust Cocoda to use this instead of the current system of querying many APIs and merging the results. In practice, there are a bunch of general and technical issues that need to be solved first.

  1. [General] If BARTOC is down, Cocoda can't be used anymore.

There are two suggestions how to improve this. a) Instead of having BARTOC proxy the requests to bartoc.org/api to our internal jskos-server instance for BARTOC, configure our webserver to do the proxying. This would remove the dependency on the BARTOC service, but still requires the jskos-server instance with its database and all our infrastructure. b) We could implement something like a "backup registry" that uses a static file of the list of schemes. This file could be hosted on GitHub, for example. I was considering to ONLY use this file, but that would mean that changes in BARTOC wouldn't be visible in Cocoda in real-time.

  1. [General] Changes in BARTOC directly transfer to Cocoda and could mess things up.

I wouldn't assume that any of the BARTOC editors would do this on purpose, but accidents happen. There's no real way to avoid this, but we could add some monitoring to check certain conditions, like if the API in the API field is accessible.

  1. [General] Many schemes in BARTOC don't have the API field set properly yet, I think.

This requires some editorial work before we push this change to the release instance of Cocoda.

  1. [Technical] Use of concepts and topConcepts fields

jskos-server sets the concepts and topConcepts fields of schemes according to whether it itself has concepts or top concepts for that particular scheme. In our case, the jskos-server instance for BARTOC only hosts schemes (apart from few exceptions), so those fields are always set to [], indicating that there aren't any concepts. BARTOC itself, in the web UI, ignores this fact and accesses the concepts anyway, but Cocoda takes these fields into account and, for example, assumes that a scheme doesn't have any top concepts.

A potential solution would be to override/delete the two fields IF the API field was used to override the registry. Because in that case, we simply don't know what the API exactly offers.

  1. [Technical] cocoda-sdk's cdk.getSchemes doesn't used registryForSchemes yet.

I think we did this because it would cause some schemes to get inaccessible and due to inefficiencies with the registry access. However, if we fix 3. and 6. in particular, this shouldn't be an issue anymore, so we would need to make a small adjustment in cocoda-sdk.

A question would be whether this should be configurable (with an option flag when calling the method) or if we should always assume the API field "works". I don't think we need to make this configurable unless there's an actual use case. Edit: I would still assume things can break and make a new major version for cocoda-sdk that includes the other changes described here. This way we can make sure that old Cocoda versions don't break even they dependencies were freshly installed.

  1. [Technical] Inefficient initialization of registries

All registries need to be initialized which, for JSKOS APIs, involves calling it's /status endpoint. Currently, cocoda-sdk doesn't "remember" the registries it has already initialized, so if the coli-conc KOS registries contain 50 schemes from a JSKOS API, we would call the /status endpoint 50 times. This can be solved by adding a registryCache to cocoda-sdk, just like it is used in BARTOC.

  1. [Technical] Missing initialization in Cocoda causes issues

Cocoda uses a registries registry.has.XYZ field to determine whether it offers a particular entity (like concepts). However, this field is set to undefined until initialization is complete. With Cocoda's current implementation, this is not an issue since we're initializing all registries on startup, but it would be better to not have to wait for that until the application is usable.

My suggested change is that if registry.has.XYZ is set to undefined, Cocoda will assume that we simply don't know yet and not abort a request. Either we wait for the initialization (we could easily do that), or we assume that the registry might offer the entity and just try to access it.

  1. [General] We should compare the full list of schemes currently available in Cocoda (from all registries) with the list of schemes in the coli-conc KOS registry to see if we have missed any important schemes. Most of the schemes that we will "lose" are schemes from DANTE we don't need, but we might have missed something.

This is a very comprehensive issue, so I'd be very grateful for some feedback @nichtich.

nichtich commented 2 years ago
  1. If BARTOC is down, Cocoda can't be used anymore.

Caching would be too complex and all we need is stable access to the jskos-server instance of BARTOC production instance. Directly routing access to http://bartoc.org/api to this jskos-server should be enough to survice BARTOC downtimes. This way Cocoda production depends on both jskos-server production (main database) and jskos-server-bartoc production.

  1. [General] Changes in BARTOC directly transfer to Cocoda and could mess things up.

No final solution unless we have versioning and approved edits in BARTOC. I think we can take the risk.

  1. [General] Many schemes in BARTOC don't have the API field set properly yet, I think.

Just add the field if missing.

  1. [Technical] Use of concepts and topConcepts fields

See https://github.com/gbv/jskos-server/issues/158

nichtich commented 2 years ago
  1. [Technical] cocoda-sdk's cdk.getSchemes doesn't used registryForSchemes yet.

I'm don't fully understand the consequences but always agree to unification. If value of API does not work, we should get a timeout or error message.

  1. [Technical] Inefficient initialization of registries

Yep, caching.

  1. [Technical] Missing initialization in Cocoda causes issues

The number of registries is small so the application can wait. This can be moved to a low prority issue.

  1. [General] We should compare the full list of schemes currently available in Cocoda (from all registries) with the list of schemes in the coli-conc KOS registry

Yes, I've already found some instances. Could be solved together with 3. by @DavidBRohrer

stefandesu commented 2 years ago

This seems to be working very well. Will be included in a release later today.