Smithsonian / trippi-sparql

An implementation of the Trippi RDF SPI using SPARQL Update over HTTP.
https://confluence.si.edu/display/RISSC/Trippi-SPARQL
Other
5 stars 3 forks source link

Clear triplestore when running fedora-rebuild? #29

Open sprater opened 6 years ago

sprater commented 6 years ago

More of a question than an issue: Does trippi-sparql library clear the triplestore when running fedora-rebuild? If not, then consider this issue a feature request.

If that would require adding functionality to Fedora 3.x (which, looking at the fedora-rebuild source code, I think might be the case), I have some documentation I can contribute about what to do prior to rebuilding the triplestore through Fedora 3.x.

ddavis commented 6 years ago

It does not. You have to handle that manually.

ajs6f commented 6 years ago

It does not. It leaves that entirely to the rebuilder utility. There is a good reason for that: there isn't any way for the client (e.g. Fedora or the rebuilder) to tell Trippi that "This is a rebuild." or "This is just normal operation." so that in order to do a feature like this, we would have to fork Fedora 3 and Trippi.

Ouch.

What's your triplestore? It's usually possible to clean out a triplestore with just a SPARQL Update command or two...

sprater commented 6 years ago

Yep, that's what I did. I'm using Fuseki (Apache Jena):

cat <<EOF > cleartriplestore.ru
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX fedora-model: <info:fedora/fedora-system:def/model#>
PREFIX fedora-view: <info:fedora/fedora-system:def/view#>
PREFIX fedora-rels-ext: <info:fedora/fedora-system:def/relations-external#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

WITH <info:edu.si.fedora#ri>

DELETE { ?s ?p ?o }
WHERE { ?s ?p ?o . }
EOF

$FUSEKI_BASE/bin/s-update --service=http://localhost:8080/fuseki/ri/update --file=cleartriplestore.ru

The DELETE statement could probably be made simpler (I used it originally for just clearing certain object triples, specifying the subject), and of course, the mechanics of how the SPARQL update will be executed will depend on the triplestore implementation. Might be good to call out in the documentation, though, that a fedora-rebuild of the resource index does NOT clear the triplestore beforehand, and that this will have to happen outside Fedora, prior to running fedora-rebuild.

ajs6f commented 6 years ago
  1. Yeah, if you want to toast the whole graph, no need for all that PREFIX preface and jazz, the whole update can be DROP GRAPH <info:edu.si.fedora#ri>, or if your triplestore supports SPARQL Graph Store (as a Jena committer, I'm glad to report that Fuseki does) it's just a single HTTP DELETE.

  2. Agreed about the desirability of documentation. PR? 🙏

sprater commented 6 years ago

will do!

sprater commented 6 years ago

I bet you thought I forgot. I did, but then I remembered. PR #30

ddavis commented 6 years ago

I am working on Docker images for Fuseki and F3 (derived from ones started by @ajs6f) that includes working out how to rebuild likely by wiping out the persistent storage. No code changes needed I hope though the rebuilder no longer exits cleanly (completes but hangs without returning to the shell).

sprater commented 6 years ago

I've repeatedly had that problem since the 3.7.x branch -- it most likely predates your changes.

ajs6f commented 6 years ago

In regards Docker images @sprater, are you familiar with ISLE?