ckan / ckanext-dcat

CKAN ♥ DCAT
164 stars 142 forks source link

_object_value and _object_value_list return BNode identifiers #289

Open EricSoroos opened 1 week ago

EricSoroos commented 1 week ago

While reviewing the scheming PR #281, I've found a couple of places where the DCAT RDF Harvester in json-ld format is having trouble with in-the-wild DCAT 2.1.1 feeds. (Specifically, an ESRI AGOL Inspire feed: https://opendata-ifigeo.hub.arcgis.com/api/feed/dcat-ap/2.1.1.json). (This doesn't appear to be related to the PR, so here it is)

Generally, _object_value and _object_value_list are returning the string value of the node, and in cases where the node has a type and something other than a direct value, this returns the internal node id of the BNode.

For example, with this (not terribly useful, but syntactically representative) provenance:

            "dct:provenance": {
                "@type": "dct:ProvenanceStatement",
                "@label": {
                    "@value": ""
                }
            },

We extract: 'provenance', ('extras', 19, 'value'): 'Nc0c0162afbe140a5afa2736468e1da4c',.

Similarly, the theme:

            "dcat:theme": {
                "@type": "skos:Concept",
                "skos:prefLabel": "Geospatial"
            },

also returns a internal node id. This is almost never going to be a useful result, because the identifiers are ephemeral, and only valid while the graph is in memory.

I'm not clear on the best course of action here, I see a couple.