Open gnewton opened 9 years ago
Yes, we are trying to adhere to this specification. Range query syntax was recently added, that is why it is incomplete (only supports numeric ranges right now)
Proximity and wildcard searches are not implemented yet, so we cannot yet support them in the query syntax.
Fuzzy syntax is implemented, though we interpret the numerical argument as the edit distance. This change is actually what Lucene does in more recent versions (in the doc you linked its still a float, which most people didn't understand)
Also, in more recent versions Lucene has added support for multiple syntaxes, so I don't think there is a single "lucene" syntax any more. However, here is a link to a more recent version that I've been trying to follow: http://lucene.apache.org/core/4_10_3/core/index.html
That is great to hear!
Apologies for referencing the incorrect documentation.
Here is what the Lucene people are calling the "classic" syntax: http://lucene.apache.org/core/4_10_3/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description
Thanks for the great work! :-)
I've started some work on this. It's still at the mucking-about stage (and I've not yet signed a CLA), but you can track what I'm doing here:
https://github.com/bcampbell/bleve/tree/queryparser
The features I'm particularly interested in supporting are boolean expressions and date ranges, eg:
tags:( (orange AND lemon) OR citrus)
published:[2014-01-01 TO 2014-01-07]
Great, two quick thoughts:
See this section of the Elasticsearch which describes some of the problems:
If we introduce these operators, it would be nice to try and also get the precedence the same as Elasticsearch/Lucene so that queries behave in a predictable way.
Hopefully these details aren't discouraging, just some things to consider as you work.
IndexMapping
, but that's only available when the search is executed (the mapping is passed in via the Searcher()
fn). I did have the idea of adding a GenericRangeQuery type which papers over the differences between NumericRangeQuery and DateRangeQuery (and maybe even alphabetic ranges between text terms like lucene... not sure how feasible that is in bleve).
It's also a little fiddly distinguishing between types at lexing/parsing time (eg 2014-05-01 is obviously a date, but is 20140501 a date or a number? etc...)
Anyway. Like I said, still all a bit experimental.tSTRING
definition is getting a bit unwieldly.
I'm looking at the grammar now.One other thought, although Bleve lets you customize to handle a variety of date formats, I think its reasonable to for the query string to support one, or possibly small set of unambiguous ones. My recommendation for now is to keep it simple, a date is simply a tSTRING that also happens to be parseable as RFC3339. That is the default we use in a lot of other places.
I've just pushed up my progess so far to a branch on my fork: https://github.com/bcampbell/bleve/tree/queryparser
I found myself running round and round in circles with yacc, so in the end decided to go with a noddy hand-rolled parser. If ever deemed worthwhile, I could probably translate it back to yacc and nex without too much hassle. I find the hand-rolled one easier to follow and reason about, but there's value in established conventions.
notes:
bleve_queryparser
tool to try out querystrings and dump the results to stdout as json.must
/should
/must_not
lists of booleanqueries... it tends to build a binary tree of queries instead. I haven't looked deeper into the bleve internals to have any real sense of what kind of performance implications that might have...I'll be back onto it next week.
Unless there is a compelling reason not to, could bleve Query string query syntax https://github.com/blevesearch/bleve/wiki/Query-String-Query
adhere to the Lucene syntax as defined in http://lucene.apache.org/core/3_5_0/queryparsersyntax.html
Differences between them that I see now:
Other things not yet implemented by bleve: