costezki / rdf2gremlin

It has never been easier to transform your RDF data into a property graph based on TinkerPop-Gremlin.
GNU General Public License v3.0
24 stars 3 forks source link

Write to file? #15

Open fils opened 4 years ago

fils commented 4 years ago

So I was very happy to find this as I need to load RDF into Janus graph.

However, while I can get it to load directly into Janus graph, was curious if there was a way to write the commands to file? In our workflow it might be nice to be able to stage the writes to Minio (s3 store) and then just feed them into Janus graph.

Not a big deal, but if there is a way to do this I'd love to know it...

costezki commented 4 years ago

Hi Douglas, thank you for your feedback.

Currently this tool was conceived to load the RDF triples into an in-memory rdflib graph, which is then fed bit by bit into the property graph using Gremlin commands.

In your case, for staging purpose, I would write the graph data in GraphML, GraphSON, or Gryo formats. These are then native to property graphs and can be loaded as a bulk.

What rdf2gremlin could implement is a conversion from RDF representation into the corresponding GraphML, GraphSON, or Gryo representation. This should be quite straight forward. I will bear this in mind for the future, or in case you are willing to contribute, I can guide you how to do it.

fils commented 4 years ago

@costezki

Thanks much for the detailed reply. I'll take a look at those three formats. With 0 knowledge of them all I'd likely try and focus on GraphSON on the assumption it's JSON esq. I'm already doing quite a both with JSON-LD so I might be able to work from there. I'm better in Go than Python so I might try there first. I'm find the Go RDF package I use to have a feel much like that of Python's RDFLIB, so I'll try and do it in a manner that might give me insight on giving it a go in this code base.

Given my note about 0 knowledge, is there a recommendation you would give between GraphML, GraphSON and Gryo?

Thanks much Doug

costezki commented 4 years ago

Give a second review to this issue, transforming RDF graphs into GraphML/GraphSON/Gyro is beyond the scope of this library. That will have to be handled by using a network library such as NetworkX, which is already implemented in the rdflib library (https://github.com/RDFLib/rdflib/blob/master/rdflib/extras/external_graph_libs.py) but differs from the graph transformation adopted here. Unless there is another use case requiring this transformation to be serialised into a file, I will postpose this issue.