assemblee-virtuelle / semapps

A toolbox to create semantic web applications
https://semapps.org
Apache License 2.0
85 stars 8 forks source link

Stop using a different dataset for WAC permissions #1001

Open srosset81 opened 2 years ago

srosset81 commented 2 years ago

As indicated by @nikoPLP in his message below, it's possible that with a more recent version of Fuseki, we could simplify the configuration of ACLs, and thus enable hot compaction.

With the version of Jena Fuseki we're using (3.17.0), there's a bug that prevents me from configuring the ACL graph as I'd like. It's a bug about transactions. When ACLs are enabled, Jena has two graphs open, the default graph, and the ACLs graph. They're both protected by my Java code, and basically, they're both stored in the same dataset on disk. When I make requests on the ACL graph in the middle of a transaction on the default graph (which happens all the time, when I need to check permissions), Jena has trouble combining the two transactions. There's one transaction open on the default graph, and another on the ACL graph. In short, it crashes. It's a Jena bug. But I don't know if they've fixed it or not in the new versions. To get around this bug, I had to configure the 2 graphs a little differently. They each have their own dataset. As you may have noticed, when ACLs are activated, an additional dataset is created, with the name localDataAcl. This is how I avoided the bug. By putting the data from the 2 graphs in two different datasets, there's no longer any problem with shared transactions, and it goes away. It makes configuring the datasets a little more complicated, but it works. To test whether the bug has been corrected, you'd need to update jena, go back to a simpler configuration (all named graphs in the same dataset with the default graph) and test.

Now to come back to hot compaction, it's buggy when the configuration is complicated, as it is for us. Hot compaction works well when there's a single dataset, but when there are two datasets associated together, as in our configuration, hot compaction no longer works. This is another jena bug.

So you have to do cold compaction. And that's why I've created this compact service that works cold, when jena is stopped. It does the compaction and you can restart jena afterwards.

If the first bug is fixed (and I have no idea if it is) then dataset configuration could be simplified, and hot compacting would even work with ACLs.

The other advantage is that it would also fix the https://github.com/assemblee-virtuelle/semapps/issues/871 bug.

It's a big job, and one that should be funded.

nikoPLP commented 2 years ago

If it worked, we'd also have to make a migration tool for configuring ACLs, to recombine the 2 graphs (the 3 in fact, now that we have a graph for the mirror) into a single dataset.

srosset81 commented 1 month ago

@nikoPLP I'm re-reading this 2-year-old message. Does it mean that, when we stop needing your WAC extension (and possibly upgrade to Fuseki v5), all named graphs will be persisted ? We'll probably prefer NextGraph over Fuseki, but being able to use both would be nice also. And for now the biggest thing that prevent us to use Fuseki in the long term is that named graphs are not persisted (thus we cannot store RDF documents in a quad store, like Solid requires)

nikoPLP commented 1 month ago

As far as I remember, Jena can save any quad without problem, unless you have configured some "hard-coded" graphnames in the config, like i did for the ACLs. So... Jena being a quad-store, shouldn't have any problem with persisting graphs, once we remove the ACL config. There is this old mail that confirms it: https://users.jena.apache.narkive.com/NeHP3TVw/how-to-call-create-delete-graph-in-fuseki

Just do e.g.
INSERT DATA { GRAPH :g { :s :p :o } }
and a graph is created for :g

anyway, i don't think it would take so much time to run a fresh Jena 5 and insert 2 quads in different graphs, then stop the jena and see if they reappear after start. I bet they will :D

In any case, I think the perfs will be at least one order of magnitude better with OxiGraph (that you now have embedded inside NextGraph, with a multi-dataset API). Also the memory consumption will be much lower. And there is no need for compact at all. Also, the goal of integrating with NextGraph is not just to provide better perfs for the quad-store. It is also to benefit from the core functionalities of NextGraph, like collaboration in CODs, E2EE encryption (for parts of the data that you might want encrypted, like DMs), and also easy replication of the data, and built-in capabilities. We will talk next week about that, starting with the question of permissions <-> capabilities

srosset81 commented 1 month ago

@nikoPLP Thanks for the answer, it's a good news ! I'm well aware that the NextGraph solution will be way superior, but I think it would be great if SemApps could still support Jena Fuseki and maybe other quad stores. So that we are quad-store-agnostic. But we will see if this is possible and not too complicated...