INCATools / kgcl-rdflib

Tools for working with KGCL
MIT License
15 stars 6 forks source link

Provide basic docs on how to use kgcl-tools #20

Open matentzn opened 3 years ago

matentzn commented 3 years ago

@ckindermann it would be great if we could come up with a small set of instructions on how to use the KGCL tools locally to demo them to potential supporters. Please advice on what would be the best way to try these locally - https://kgcl.ontodev.com/ is great and works for small examples, but we couldn't use it with private data or bigger ontologies.

Thank you! ;)

matentzn commented 3 years ago

I also have the first proper use case now in Mondo.. A user is asking for regular reports on possible changes. I am happy to work of the repo itself, if someone could point me to instructions on how to "run a diff" :) Thanks.

ckindermann commented 3 years ago

@matentzn Have a look at the folder kgcl_tool on branch parser of my fork. There is a README.md that should help. :)

If you want to use things ASAP, then I suggest checking out this commit. Otherwise, I have an update coming up next week in which I prepare things to be merged into master here.

matentzn commented 3 years ago

Danke.

matentzn commented 3 years ago

I also need some quick instructions on how to turn my ontology into nt format. It would be difficult if the primary format for KGCL tools is not a ROBOT supported format :)

ckindermann commented 3 years ago

Right! There you go!

matentzn commented 3 years ago

This is not for you to figure out, put if we tie n-triples to kgcl tools, we will have to provide standard converters, at least in ROBOT or some such.. However, arent you using rdflib? Cant we simply read any format rdflib supports?

ckindermann commented 3 years ago

Yeah, that should work - and would require barely any changes as far as I can see. I will give that a whirl later this week. (Working with n-triples is a relict from when I first started prototyping - we weren't sure then whether we'd stick with rdflib.)

jamesaoverton commented 3 years ago

The two main operations for KGCL are patch and diff. We designed to be able to handle largish ontologies. In our experience, rdflib is slow. We have tried not to commit too heavily to it.

For patch we read KGCL text generate SPARQL UPDATE. You can then execute the SPARQL UPDATE with whatever tool you want, including ROBOT. KGCL will execute it using rdflib.

diff is harder. We use rdflib to generate the KGCL text from two sets of triples: "before" and "after". By using N-Triples we can use standard diff tools to reduce the size of the before and after sets, then load the smaller sets in rdflib. We have tried not to require rdflib to load the full ontologies. The current version does use rdflib to load the full graph, which means that it isn't limited to N-Triples. To optimize performance you can still use N-Triples and pre-diff, but there are some caveats.

Recent versions of ODK include apache-jena -3.12.0, which includes riot. riot will efficiently convert from any RDF format in N-Triples.

@matentzn If you can get by using riot, I think that will be the most performant option.

matentzn commented 3 years ago

yep, that would work! Thanks!