gbv / cocoda-sdk

SDK for Cocoda and coli-conc services
https://gbv.github.io/cocoda-sdk/
MIT License
5 stars 1 forks source link

LOBID seems to miss some embedded GND mappings #66

Open nichtich opened 1 month ago

nichtich commented 1 month ago

See https://lobid.org/gnd/4026894-9 and same GND record in Cocoda: only one embedded mapping is detected but there are more closeMatch, exactMatch and DDC mappings.

stefandesu commented 1 month ago

Good call. Currently, only the sameAs property of the LOBID JSON record is parsed. There are also exactMatch, closeMatch, and I assume more. However, these currently only include the target concept URI, without any information about the target vocabulary (in sameAs, this is included via collection). We could simply include these without toScheme, but most applications would have trouble actually using this, I think.

nichtich commented 1 month ago

The list of target vocabularies is small and each vocabulary has a known URI namespace, these could be hardcoded.

stefandesu commented 1 month ago

The list of target vocabularies is small and each vocabulary has a known URI namespace, these could be hardcoded.

Sounds good. Is there a list of target vocabularies for embedded mappings in GND?

nichtich commented 1 month ago

So far I've seen

This should be enough to start with.

acka47 commented 2 weeks ago

FYI, a full list of enrichments and the linking properties used in GND/EntityFacts can be found at https://wiki.dnb.de/x/TZa5C

stefandesu commented 2 weeks ago

I'm confused, as most of those listed by @nichtich are not on that list. 🤔 @acka47

acka47 commented 2 weeks ago

Sorry for the confusion. This list at https://wiki.dnb.de/x/TZa5C wasn't the best pointer for this context as it is about the sameAs statements. For the linking sources @nichtich listed other RDF properties are used, see his example https://lobid.org/gnd/4026894-9.

DDC:

"relatedDdcWithDegreeOfDeterminacy2" : [ {
    "id" : "http://dewey.info/class/1--0285/",
    "label" : "http://dewey.info/class/1--0285/"
  } ],
  "relatedDdcWithDegreeOfDeterminacy3" : [ {
    "id" : "http://dewey.info/class/004/",
    "label" : "http://dewey.info/class/004/"
  } ],

LCSH, RAMEAU, BNCF, EMBNE (and STW should also work like this):

"closeMatch" : [ {
    "id" : "http://id.loc.gov/authorities/subjects/sh89003285",
    "label" : "http://id.loc.gov/authorities/subjects/sh89003285"
  }, {
    "id" : "https://data.bnf.fr/ark:/12148/cb11932109b",
    "label" : "https://data.bnf.fr/ark:/12148/cb11932109b"
  }, {
    "id" : "http://purl.org/bncf/tid/1576",
    "label" : "http://purl.org/bncf/tid/1576"
  }, {
    "id" : "https://datos.bne.es/resource/XX525961",
    "label" : "https://datos.bne.es/resource/XX525961"
  } ],

However, at least RAMEAU, are also linked via sameAs.

Searching for SKOS links in lobid, I get back some more sources, e.g. $ curl "https://lobid.org/gnd/search?q=_exists_:exactMatch&size=500" | jq .member[].exactMatch[].label | sort yields concepts in these namespaces:

Searching for closeMatch ($ url "https://lobid.org/gnd/search?q=_exists_:closeMatch&size=500" | jq .member[].closeMatch[].label | sort) adds https://purl.org/bncf/tid/, https://purl.org/bncf/tid/, and https://datos.bne.es/resource/.

Interestingly, for some sources (id.loc.gov, agrovoc, mesh, stw, BNCF) both http and https URI schemas can be found which is probaby not intended. (Ping @thoffma.)

thoffma commented 2 weeks ago

Thank you for pointing this out. We will check the URIs in this regard before the next dump creation in October/November.

stefandesu commented 1 week ago

Implemented and released in v3.4.11. I also updated the Cocoda dev instance, so you can see the results for the example above here: https://coli-conc.gbv.de/cocoda/dev/?fromScheme=http%3A%2F%2Fbartoc.org%2Fen%2Fnode%2F430&from=https%3A%2F%2Fd-nb.info%2Fgnd%2F4026894-9

Possible improvements: