gbv / cocoda

A web-based tool for creating mappings between knowledge organization systems.
https://coli-conc.gbv.de/cocoda/
MIT License
39 stars 5 forks source link

Get mapping recommendations via Reconciliation API #84

Closed nichtich closed 5 years ago

nichtich commented 6 years ago

OpenRefine Reconciliation API can be used to map labels to concepts. Reconciliation API endpoints are available at least for Wikidata and for GND (http://blog.lobid.org/2018/07/02/lobid-update.html) to start with. The recommendations could be shown alternative to occurrences or together with existing mappings in the mapping browser.

Field match and score can result in JSKOS mapping type (closeMatch or exactMatch?) and mappingRelevance

nichtich commented 5 years ago

Two examples of Reconciliation requests against Wikidata in German:

The source code is available here in Python.

Same example from of Reconciliation requests against GND:

$ curl --data 'queries={"q1":{"query":"verdauung"}}' https://lobid.org/gnd/reconcile | jq .

Relevant fields id and name of each result can be mapped to notation and prefLabel (with fixed language per request). Mapping type must be guessed from score (open question). The URI is build from id appended a namespace to be configured for each endpoint. Example of proposed configuration format:

{
  "reconciliationProviders": [
    {
      "scheme": "http://bartoc.org/en/node/18785",
      "url": "https://lobid.org/gnd/reconcile",
      "namespace": "http://d-nb.info/gnd/"
    },
    {
      "scheme": "http://bartoc.org/en/node/1940",
      "url": "https://lobid.org/gnd/reconcile",
      "namespace": "http://www.wikidata.org/entity/"
    }
  ]
}

Further extension is possible to filter by entity type (after #57 has been implemented).

stefandesu commented 5 years ago

These should appear as mapping recommendations (letter R) in MappingBrowser. We might implement this feature for 0.6.0 already because mapping recommendations are a big part of 0.6.0.

This feature depends on #168 to be implemented first.

stefandesu commented 5 years ago

I'm currently trying to implement this as a provider, but I have trouble with Lobid's Reconciliation API. The Wikidata one seems to work fine, but I wasn't able to get a successful response from Lobid's API using axios, no matter what I tried. Any idea, @nichtich?

stefandesu commented 5 years ago

I added a first implementation without adding a config entry yet. To test it, add the following registry entry to the config (for Wikidata):

{
  "uri": "http://coli-conc.gbv.de/registry/wikidata-reconciliation",
  "notation": ["R"],
  "prefLabel": {
    "de": "Wikidata-Reconciliation",
    "en": "Wikidata Reconciliation"
  },
  "provider": "ReconciliationApi",
  "subject": [{
    "uri": "http://coli-conc.gbv.de/registry-group/automatic-mappings"
  }],
  "scheme": {
    "uri": "http://bartoc.org/en/node/1940"
  },
  "baseUrl": "https://tools.wmflabs.org/openrefine-wikidata/{language}/api",
  "namespace": "http://www.wikidata.org/entity/"
}

Note that the provider will only be shown if Wikidata is chosen on one side.

The requests are pretty slow, but the results seem to be helpful. It already supports a local cache (just like the occurrences provider), both sides, and mapping types (currently exact match if match is true, close match if score is 80 or above, and mapping relation otherwise).

While implementing this, I was wondering whether we should have one registry for each target scheme or rather create a wrapper API that receives for example the labels and target scheme and internally choses the right API endpoint. That would be one more thing to maintain though...

Also, I'm pretty sure there's something wrong with lobid's Reconciliation API because adjusting the examples provided by the OpenRefine Wiki page does not work. I'll send them an email and ask for advice.

stefandesu commented 5 years ago

ebc9bdf9b9c20327e834b9a82e7e9b4c859b0b68 changes how the provider accesses the API, now supporting the lobid API. Unfortunately, the requests still get blocked by CORS, but I already sent them an email about that.

stefandesu commented 5 years ago

They will add the appropriate header to the API. It's already working with their test system, but I will wait until they deployed the change into the production system before adding the registry to the default config. Afterwards, I would close this issue.