DerwenAI / kglab

Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
https://derwen.ai/docs/kgl/
MIT License
574 stars 65 forks source link

I couldn't find a way to add type coercion in the context of the RDF graph #276

Closed psaboia closed 1 year ago

psaboia commented 2 years ago

I'm submitting a

Current Behaviour:

I couldn't find a way to add type coercion in the context of the RDF graph. For example, I would like to produce a compact JSON-LD file that uses only terms like the one below:

  {
"@context": {
  "homepage": {
    "@id": "http://xmlns.com/foaf/0.1/homepage",
    "@type": "@id"
  }
},
"@id": "http://me.markus-lanthaler.com/",
"homepage": "http://www.markus-lanthaler.com/"

I've tried it by using the keyword namespace, but It didn't work. Here is my attempt using namespace:

import kglab
import rdflib

namespaces = {
    "foaf": "http://xmlns.com/foaf/0.1/",
    "homepage": { "@id": "http://xmlns.com/foaf/0.1/homepage", "@type": "@id" }
}

kg = kglab.KnowledgeGraph(
    name = "A simple KG.",
    namespaces = namespaces,
    )

kg.add(partnode, rdflib.term(kg.get_ns("homepage")), rdflib.URIRef("http://www.markus-lanthaler.com/"))

And I got the following type of error:

TypeError: Predicate {'@id': 'http://xmlns.com/foaf/0.1/homepage', '@type': '@id'} must be an rdflib term
Mec-iS commented 2 years ago

looks like the predicate you are passing is not a valid rdflib term, "homepage" is an object not a URI. kglab.add is just a convenience method to access rdflib.Graph.add, please see its reference documentation at https://rdflib.readthedocs.io/en/stable/intro_to_creating_rdf.html#adding-triples-to-a-graph

Please explain how you think this should work, we may consider adding add_with_type method.

ceteri commented 2 years ago

Hi @psaboia ,

First, I should check about what you mean by type coercion? My hunch is about changing the types references in the JSON-LD context?

Instead, from what I see there are three problems in the code fragment above:

This might be a better way to implement the same code:

import kglab
import pathlib
import rdflib

namespaces = {
    "foaf": "http://xmlns.com/foaf/0.1/",
}

kg = kglab.KnowledgeGraph(
    name = "A simple KG.",
    namespaces = namespaces,
)

partnode = rdflib.URIRef("https://github.com/lanthaler")

kg.add(
    partnode,
    kg.get_ns("foaf").homepage,
    rdflib.URIRef("http://www.markus-lanthaler.com/"),
)

kg.save_jsonld(
    pathlib.Path("foo.json"),
    auto_compact = True,
)

which then produces file foo.json:

{
  "@context": {
    "@language": "en",
    "brick": "https://brickschema.org/schema/Brick#",
    "csvw": "http://www.w3.org/ns/csvw#",
    "dc": "http://purl.org/dc/elements/1.1/",
    "dcam": "http://purl.org/dc/dcam/",
    "dcat": "http://www.w3.org/ns/dcat#",
    "dcmitype": "http://purl.org/dc/dcmitype/",
    "dct": "http://purl.org/dc/terms/",
    "dcterms": "http://purl.org/dc/terms/",
    "doap": "http://usefulinc.com/ns/doap#",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "odrl": "http://www.w3.org/ns/odrl/2/",
    "org": "http://www.w3.org/ns/org#",
    "owl": "http://www.w3.org/2002/07/owl#",
    "prof": "http://www.w3.org/ns/dx/prof/",
    "prov": "http://www.w3.org/ns/prov#",
    "qb": "http://purl.org/linked-data/cube#",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "schema": "https://schema.org/",
    "schema1": "http://schema.org/",
    "sh": "http://www.w3.org/ns/shacl#",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "sosa": "http://www.w3.org/ns/sosa/",
    "ssn": "http://www.w3.org/ns/ssn/",
    "time": "http://www.w3.org/2006/time#",
    "vann": "http://purl.org/vocab/vann/",
    "void": "http://rdfs.org/ns/void#",
    "xml": "http://www.w3.org/XML/1998/namespace",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  },
  "@id": "https://github.com/lanthaler",
  "foaf:homepage": {
    "@id": "http://www.markus-lanthaler.com/"
  }
}

Admittedly, that JSON-LD context has all of the namespaces brought in through kglab, not just the ones that are actually referenced.

We could provide a new method to remove the unused namespaces, if that would be helpful? In that case the results would look cleaner:

{
  "@context": {
    "@language": "en",
    "foaf": "http://xmlns.com/foaf/0.1/"
  },
  "@id": "https://github.com/lanthaler",
  "foaf:homepage": {
    "@id": "http://www.markus-lanthaler.com/"
  }

Note that RDFlib since 6.0.0 has included JSON-LD serializer support. So the kwargs to kg.save_jsonld() simply pass through to the JSON-lD serializer in RDFlib. In this case I've used the auto_compact flag which produces a compact format.

psaboia commented 1 year ago

looks like the predicate you are passing is not a valid rdflib term, "homepage" is an object not a URI. kglab.add is just a convenience method to access rdflib.Graph.add, please see its reference documentation at https://rdflib.readthedocs.io/en/stable/intro_to_creating_rdf.html#adding-triples-to-a-graph

Please explain how you think this should work, we may consider adding add_with_type method.

Hi Mec-iS,

I understand the predicate when passed in that way is not a valid rdflib term. Sorry, for adding this and confusing my main point.

My main point was to be able to compact the IRI http://xmlns.com/foaf/0.1/homepage to the term homepage, which is specified in the context by "homepage": { "@id": "http://xmlns.com/foaf/0.1/homepage", "@type": "@id" }, so that I can use the term homepage as a predicate in the triple in the KG.

That way to compact IRIs is found in the reference documentation at https://www.w3.org/TR/json-ld11-api/#example-8-compacted-sample-document.

I was just trying to keep using kglab to build the graph, and then serialize it as a compact json-ld.

psaboia commented 1 year ago

Hi ceteri,

Sorry for not being clear when mentioning the word coercion. The idea is to have a shorter and human-readable possible json.

My main point was to be able to compact the IRI http://xmlns.com/foaf/0.1/homepage to the term homepage, which is specified in the context by "homepage": { "@id": "http://xmlns.com/foaf/0.1/homepage", "@type": "@id" }, so that I can use the term homepage as a predicate in the triple in the KG.

That way to compact IRIs is found in the reference documentation at https://www.w3.org/TR/json-ld11-api/#example-8-compacted-sample-document.

I didn't know about the auto_compact flag for saving a jsonld. Thanks for giving another way to produce similar output. Yes, It would be helpful to have a new method to remove unused namespaces.

Thanks!