DerwenAI / kglab

Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
https://derwen.ai/docs/kgl/
MIT License
581 stars 66 forks source link

help loading nquads into kglab.KnowledgeGraph #310

Closed fils closed 1 year ago

fils commented 1 year ago

I'm submitting a

Current Behaviour:

Can't see an obvious path to loading nquads

Expected Behaviour:

Be able to load nquads into an rdflib Graph or, if required rdflib.Dataset. However, just converting from nquads to ntriples would really be fine for my use case. However, having named groups would be nice. I do see the named graphs (multi-graph / ConjunctiveGraph support in rdflib) Curious how to leverage with the kglab abstractions

Steps to reproduce:

code below

Environment:

Code example

namespaces = {
    "shacl":   "http://www.w3.org/ns/shacl#" ,
    "schmea":   "https://schema.org/" ,
    "geo":      "http://www.opengis.net/ont/geosparql#",
}

kg = kglab.KnowledgeGraph(
    name = "Schema.org based datagraph",
    base_uri = "https://example.org/id/",
    namespaces = namespaces,
)

kg.load_rdf("http://ossapi.oceaninfohub.org/public/graphs/summonedobis_v1_release.nq", format="nquads", base=None)

this seems to work, but

sparql = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT *
WHERE
{  graph ?g {
    ?s ?p ?o .
  }
}
limit 10
"""

pdf = kg.query_as_df(sparql)

gives

Exception: You performed a query operation requiring a dataset (i.e. ConjunctiveGraph), but operating currently on a single graph.

Looking at https://rdflib.readthedocs.io/en/stable/intro_to_parsing.html it would seem I could leverage named graphs with

from rdflib import Dataset
from rdflib.namespace import RDF

g = Dataset()
g.parse("demo.trig")

for s, p, o, g in g.quads((None, RDF.type, None, None)):
    print(s, g)

reviewing https://derwen.ai/docs/kgl/ref/ it seems this support is in there. So I am curious what syntax I am missing to leverage this.

fils commented 1 year ago

OK.. I remembered..

# load quad graph
g = ConjunctiveGraph()
g.parse("http://ossapi.oceaninfohub.org/public/graphs/summonedobis_v1_release.nq", format="nquads")
print(len(g))

kg = kglab.KnowledgeGraph(name = "OIH test", base_uri = "https://oceaninfohub.org/id/", namespaces = namespaces, import_graph = g)

but is there a way to declare this when initializing a new kglab.KnowledgeGraph? Or is the recommendation to use rdflib Conjunctive as an import?