SemanticComputing / fuseki-docker

Apache Jena Fuseki with SeCo extensions
MIT License
33 stars 15 forks source link

Please help with daily batch loading of TTLs #18

Closed rjalexa closed 1 year ago

rjalexa commented 1 year ago

Very sorry to open this as an issue since it might just me being an inept novice.

My knowledge extraction processes generate new RDF triples serialized to Turtle format (*ttl files.) every day and so I need to add the new triples daily to the existing graph.

I currently load the graph with the following command:

docker exec bc89541add49 ./bin/s-put http://localhost:3030/mema_v5 'default' /fuseki-base/databases/mema_ttls/all/20230722_all.ttl

this container is started from the fuseki-secoresearch-4.8.0 image

and this works well. But the day after when I use the same command with the new TTL file, the RDF TDB store only has these last triples.

Reading Jena's documentation I think I should/could use something like tdb2.tdbloader --tdb ../../apache-jena-fuseki/run/configuration/test1.ttl furniture.ttl or perhaps another utility called 'riot' but I am unable to find either inside the container and am too ignorant about docker to understand how to proceed. Would be grateful for any help.

rjalexa commented 1 year ago

I did find a /jena/bin/tdbloader2 executable in the container but running it just gives the following message: tdbloader2 has been renamed tdb1.xloader

yoge1 commented 1 year ago

Thanks for the question.

If you wish to add data into an existing graph, you should use s-post instead of s-put (s-put replaces the graph's current data with the triples in the file provided).

As you suggested, you can also use tdbloader to load data into a TDB database. Note that when using tdbloader, you shouldn't have a running Fuseki instance using the same database, as "A TDB dataset should only be directly accessed from a single JVM at a time otherwise data corruption may occur.". See https://jena.apache.org/documentation/tdb/commands.html.

Yes, tdbloader2 has been apparently renamed tdb1.xloader, which is not currently included in this secoresearch/fuseki image (please do make an issue to request it if you wish!). However, tdbloader is included. See an example on using that: https://github.com/SemanticComputing/congress-legislators/blob/master/Dockerfile.

Also, it should be noted that "xloader is not a replacement for regular TDB1 and TDB2 loaders. It is for very large datasets.". See https://jena.apache.org/documentation/tdb/tdb-xloader.html.