DerwenAI / kglab

Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
https://derwen.ai/docs/kgl/
MIT License
581 stars 66 forks source link

question: can load_jsonld work with file like object StringIO? #212

Closed fils closed 2 years ago

fils commented 2 years ago

I'm submitting a

Current Behaviour:

ref: https://github.com/gleanerio/notebooks/blob/master/notebooks/validation/frame_assay.ipynb

I have several JSON-LD framing events with the results in a list. I want to iterate through the list and load this into kglab KnowledgeGraph object.

This works:

rnamespaces = {
    "schema":  "https://schema.org/",
    "shacl":   "http://www.w3.org/ns/shacl#" ,
}

rkg = kglab.KnowledgeGraph(
    name = "Schema.org shacl eval datagraph",
    base_uri = "https://example.org/id/",
    namespaces = rnamespaces,
)

for r in results:
    # write to a file for hack input patter
    with open("/tmp/data.jsonld","w") as f:
        f.write(json.dumps(r))
        f.close()

    path = pathlib.Path("/tmp/data.jsonld")
    rkg.load_jsonld(path)  # need to load as JSON-LD

However, writing, closing, reading and overwriting is not elegant. I thought I could pass a "file like object" to the load_json function so I tried

for r in results:
    # try a file "like" object
     with io.StringIO() as f:  
         f.write(json.dumps(r))

     rkg.load_jsonld(f) 

but this errors out with

ValueError: <_io.StringIO object at 0x7f765fed81f0> is not a valid string, Path, or list of Paths

I'm rather new to python at this level, so perhaps I am doing this wrong or not following correctly. Can a StringIO or ByteIO object be used here as a "file like object" for the load_jsonld() function?

Expected Behaviour:

Would like to have it work with a python "file like" object.

Steps to reproduce:

My steps are at: https://github.com/gleanerio/notebooks/blob/master/notebooks/validation/frame_assay.ipynb

No line numbers but in In cell 41 (near bottom)

Environment:

ceteri commented 2 years ago

We should note that our load_jsonld() and save_jsonld() methods will probably be deprecated, since this has been pulled directly into RDFlib v6+ and JSON-LD support no longer requires a special plugin.

That said, our load_rdf() and load_rdf_text() methods can parse from either a PATH-like object or a string, respectively.

Since the StringIO class in Python makes text look like it's being read from a file, it may work better to drill-down to RDFlib methods themselves. In other words, if you had a kg object and s as a StringIO object then:

g = kg.rdf_graph()
g.parse(file=s, format="json-ld")

where you're calling RDFlib.Graph.parse() directly.

Will that work well?

fils commented 2 years ago

@ceteri

with that info I was able to simply try

for r in results:
    rkg.load_rdf_text(data=json.dumps(r), format="json-ld") 

like to do with turtle and that works fine now..

So my use case is solved.

ceteri commented 2 years ago

So glad to hear! Happy Holidays -