calrissian / accumulo-recipes

Recipes & cookbooks for Accumulo.
http://www.calrissian.org
Apache License 2.0
37 stars 18 forks source link

Better query support for event store #54

Open eawagner opened 10 years ago

eawagner commented 10 years ago

Currently there are a lot of seemingly arbitrary restrictions to the query format that can be used for the event store. For example, ANDs can only be used at the bottom of the query tree, and only equals and not equals are supported. This store should have better query support.

For example on using JEXL, look at the wiki search example in the 1.4 tree of accumulo. https://github.com/apache/accumulo/tree/1.4.5-SNAPSHOT/src/examples/wikisearch

This would probably be done piecemeal to add more complex support, but at a minimum, this store should be able to handle a complex query tree consisting of an arbitrary level and ANDs and ORs.

cjnolet commented 10 years ago

+1 JEXL would be nice.

cjnolet commented 10 years ago

I'm actually not completly sold on JEXL after having worked with it on the wikisearch iterators. The datastructures aren't the easiest to work with and it doesn't offer much more in the way of expressiveness than we already have with our mango-criteria code. Just for the record, I'm not advocating that we keep that code around forever either (as discussed offline with Alan).

The problem I have with JEXL is that it's too easy to screw up and it doesn't express much in terms of data types, specifically in terms of pluggable datatypes. We've put a lot of effort into creating the APIs behind our pluggable types in the Calrissian platform and with normalizing and keeping the data types cascading all the way to the server, it would much up the JEXL implementation just enough to make it extremely painful for users.

On the project in which Im using the wikisearch iterators, I have the mango-criteria layer drawing JEXL looking like the following:

('enabled' == 'bool\x01true' and 'name' == 'string\x01rule1')

Also I've managed to expose our criteria builder through Groovy so that queries can be issued like the following:

q.and().eq("enabled", true).eq("name", "rule1").end()

One thing this builder pattern really makes available is the ability to normalize raw java types. I'd like to make sure we aren't re-implementing the query layer just for the sake of re-implementing it. If we see a benefit (like the ability to do extremely complex optimizations) that is actually adding value then I can see a place for that. For now, I'm not 100% sold on swapping out the builder we have just because. We've got users plugging away on this pattern in my current environment and there's hasn't been a whole lot of need for complex optimizations. If they do a bad query... they do a bad query. It's possible in pig too... you can shoot yourself in the foot if you aren't careful. JEXL isn't going to offer much of a fix for that either.

cjnolet commented 10 years ago

Another good thing to check out would be the VertexQuery and GraphQuery interfaces in the Tinkerpop Blueprints project. It was very easy to plug into.

cjnolet commented 9 years ago

Can this be closed?