eclipse-rdf4j / rdf4j

Eclipse RDF4J: scalable RDF for Java
https://rdf4j.org/
BSD 3-Clause "New" or "Revised" License
364 stars 164 forks source link

Deprecate SPIN support #1262

Closed abrokenjester closed 5 years ago

abrokenjester commented 5 years ago

Now that SHACL support is maturing I think we need to make a decision on whether we choose to continue support SPIN reasoning. From my perspective everything SPIN can do, SHACL can do as well (or better), there's a fairly straightforward migration from SPIN notation to SHACL, and there's no real need to continue to support both.

We also need to focus our resources: there's only so many developers on this project with so many hours, and the lead developer of the original SPIN engine is no longer very actively involved.

In my opinion: we should deprecate SPIN and focus any future development and maintenance on SHACL instead.

barthanssens commented 5 years ago

To be clear, by deprecated, do you mean "bug-fixes only", or "to be removed in 3.0" ?

abrokenjester commented 5 years ago

I mean bug fixes only, really. I wouldn't remove until 4.0 or later, and not until we are sure that we can fully support everything SPIN offers in the new SHACL engine.

@hmottestad pointed out somewhere else that there are use cases for SPIN that SHACL doesn't really cover. I don't quite understand why that is the case, as this article seems to suggest that SHACL can do everything SPIN can. Havard, maybe I misunderstood and you only meant to suggest that our current SHACL implementation doesn't fully cover SPIN use cases?

hmottestad commented 5 years ago

The sparql based “inferencer” in SPIN was not standardized in SHACL as far as I can tell. The features your article mentions are not mentioned in the standard. I think TopQuadrant got a lot of pushback from the community and ended up having to focus SHACL on the usecase of validation. From what I can tell there is no rule/sparql based “inference” in SHACL.

hmottestad commented 5 years ago

In general I’m positive to deprecating and removing SPIN. I would like to develop a lightweight alternative though, for those who still want to use SPARQL for “inference”. A simple SAIL that takes UPDATE queries and runs them until no more changes are produced.

The test suite for SPIN is the ultimate proof of why SPIN should be removed. If I remember correctly part of it is still @ignored because it’s too unstable and slow.

hmottestad commented 5 years ago

Btw. This is the document that is referenced wrt. inference: https://www.w3.org/TR/shacl-af

“Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.”

I don’t think this will go anywhere. It’s equivalent of sneaking parts of the XSLT spec into the XSD spec.

pulquero commented 5 years ago

As has been pointed out, there are things that can be done in SPIN that have no analogue in SHACL. But, I think it is fair enough that the SPIN SAIL be considered for removal in 4.0. In some sense, what is important about SPIN is not SPIN itself, but the changes to RDF4J that made it possible. With that in mind, moving the SPIN SAIL in-house or to an independent github project wouldn't be too problematic. I would say it is more important that RDF4J makes it possible for consumers to build such things than provide such things (I'm still finding occasions where I'm having to override 2-3 classes just to customise a single method :/).

Things worth saving:

barthanssens commented 5 years ago

@pulquero would you have a list of classes that needs to be overridden (or does it depend on use case), might be a good opportunity to change them in 3.0...

pulquero commented 5 years ago

Sure, modifying the list of query optimizers. This requires subclassing a sail to return a new sailconnection subclass that uses a different set of query optimizers. An example of something like this being fixed was the introduction of EvaluationStrategyFactory. Broadly there is a lack of points to plug in strategies for doing things.

abrokenjester commented 5 years ago

@pulquero interesting idea, though I'm not immediately sure I understand why you'd need this. We're getting a bit off-topic so feel free to log this idea as a separate ticket and continue discussion there, but: what use case do you have in mind where you need to be able to change the set of query optimizers used by, for example, the native store?

pulquero commented 5 years ago

To add new custom optimizers, for example converting filter expressions to btree range scans on literals. Or generate new ast nodes that track query statistics and cardinalities. Then there are interpreters for tuplefunctions, making it possible to call out to algorithms such as page rank.

abrokenjester commented 5 years ago

Is there a reason these ideas for optimisers are 'custom'? What I'm getting at is couldn't we just add such optimisers to the core systems instead of making it configurable from outside? Or are these somehow only applicable in certain use cases or something?

Oh and not sure what you mean with adding extra ast nodes. The optimisers work on the algebra, not the ast.

As for tuple functions and calling out to external algorithms, sounds great but I'm not sure I understand why that needs to happen in an optimiser.

Sorry if I'm being dense, it's 36 degrees in the shade here, so I'm looking for simple answers ;)

On Sat, 2 Feb. 2019, 12:56 pulquero <notifications@github.com wrote:

To add new custom optimizers, for example converting filter expressions to btree range scans on literals. Or generate new ast nodes that track query statistics and cardinalities. Then there are interpreters for tuplefunctions, making it possible to call out to algorithms such as page rank.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eclipse/rdf4j/issues/1262#issuecomment-459924383, or mute the thread https://github.com/notifications/unsubscribe-auth/AAseyFw569fiaBoS5tVkEhBJMZdqFZUlks5vJPBngaJpZM4aT6K4 .

hmottestad commented 5 years ago

-5C in the shade here.

pulquero commented 5 years ago

You could but that doesn't make it any easier to develop them in the first place. And also it doesn't make it easy for you to add them. Atm, you need to add a new optimizer to every sail impl. If there was a strategy/factory interface, there could be some base class that it would just need to be added to.

Sorry, I meant algebra not ast.

"Optimizer" is where algebra rewritting happens, so it is currently the best place to replace a set of nodes with a tuplefunctioncall. QueryParser and EvaluationStrategy(Factory) can be customised/extended but the algebra transformation bit in the middle is lacking a similar treatment.

abrokenjester commented 5 years ago

I've created a new ticket for the idea to make optimizers injectable.

Back to SPIN deprecation: given the feedback, I am leaning towards not making it deprecated for now. There's clearly still a use case (rule-based inference) where we have no better alternative.

pulquero commented 5 years ago

I think it is also worth noting that it is more than just rule-based inferencing (that has no alternative). As I sort of alluded to, I use alot of the spin stuff in an ad-hoc way, rather than via the SAIL, e.g. SpinParser. There is no SHACL equivalent to doing these types of things.

hmottestad commented 5 years ago

I’m not sure what the spin parser does.

One important aspect of the SHACL implementation is that it is not a SHACL -> SPARQL converter but instead generates more advanced query plans based on what has changed in the transaction to query the underlying data store, added statements in the transaction, removed statements and also the state of the underlying data store before the start of the transaction.

HolgerKnublauch commented 5 years ago

(Bumped into this via Twitter). I don't think the status of the WG note should be an obstacle to adoption. It is much closer to a W3C standard than the SPIN member submission. The WG just wasn't chartered to publish this as an official deliverable, there would certainly have been enough interest otherwise. I'd say once there is a RDF4J SHACL rule engine, there is no need to keep SPIN around. Off the top of my head, the only features that SPIN has that don't have an equivalent in SHACL are magic properties and the SPIN RDF syntax. Both however have influenced the design of SHACL node expressions (see the SHACL-AF document) and, more recently, SHACL property value rules (only part of the newer SHACL-AF drafts). The latter might be more efficient to implement for on-the-fly reasoning than the SPARQL-based rules. We are using them a lot nowadays.

nbassili commented 5 years ago

What about a migration tool for existing ontologies with SPIN to SHACL?

hmottestad commented 5 years ago

That would be great. I know SHACL quite well by now, but I don’t know SPIN enough. My experience with SPIN has been that most of the restrictions are defined using SPARQL, which means that you would need to create a SPARQL restrictions to SHACL converter. That might be a lot harder.

Would be great if you give it a shot though and make a pull request.

HolgerKnublauch commented 5 years ago

For the record, I don't know about a SPIN-to-SHACL converter. Some patterns (such as spin:rules with SPARQL) should be straight-forward to convert, almost with a bunch of SPARQL UPDATE queries. TBC has features to convert between the SPIN RDF (triples) syntax and sp:text, which is then easier to convert to sh:construct etc.

hmottestad commented 5 years ago

I guess you are thinking of the SPARQL support in SHACL, and the inference support in the working group note.

I see a lot of use for general SPARQL support in SHACL. However I'm quite apprehensive of supporting it due to how badly it scales and how easy it is for users to shoot themselves in the foot.

hmottestad commented 5 years ago

I’m thinking we should mark SPIN as deprecated and just leave it there for the foreseeable future.

I’ve had two questions in the last couple of months about SPIN performance and inconsistent results.

SPIN was never really meant to work on a live database, or that’s at least my interpretation. Someone had made a cool construct query that wasn’t updating correctly because our SPIN implementation requires a triple to be deleted before it will clear any inferred statements. The SPIN spec doesn’t detail this kind of problem at all.

For the SPIN performance issues. I’ve made SPIN 10x faster in 3.0.0, but it’s still slow. The user who commented added one statement and then deleted it, which took around 2-3 seconds.

SPIN is one of those technologies that woo new developers with “look how simple and easy this is” and then makes the semantic tech stack look immature once you see how terrible the performance is.

HolgerKnublauch commented 5 years ago

Yes, SPIN should be deprecated as it is being replaced by SHACL. From my perspective, SPIN was never really designed for performance but rather for expressiveness. Of course, once you open up all of SPARQL's richness, it is easy to run into slow queries and hard to implement optimizations that work with every possible query. The responsibility of using the richness wisely lies with the user.