Closed Ironykins closed 6 years ago
I have a question: I execute the following query "field: value \: *" through the library I get error does this fix (Candidate 1.4) solve this kind of problem that I have described to you?
@lucecc: No, this will not permit parsing of field: value \: *
, as this is not a valid lucene query (validated with the official Java based lucene query parser).
@Ironykins: While I like the idea of keeping track of the exact user input, I don't really like that this is a breaking change. Would it be possible for you to change this PR such that.
rawTerm
?term
field is unchanged?This way it wouldn't be a breaking change and you would still be able to access the raw input of the user.
On second thought, I now understand more clearly where you are coming from. Let me revisit this again.
Ignore my ignorant comment (https://github.com/bripkens/lucene/pull/16#issuecomment-375739422). I am taking a deeper look at your changes right now and will manually merge them to master with some smaller teaks. As mentioned before, I didn't understand at first why you were doing this, but it completely makes sense now :).
I picked your changes to master in https://github.com/bripkens/lucene/commit/e39a772ff6267515f828200d0f4bc5bacf9ae78d. Also, I made some additional changes such that we clearly indicate and handle the breaking change (and to provide helpers to account for these breaking changes). Once the build is done, I'll release it as 2.0.0.
Thank you very much for this PR!
Released as 2.0.0: https://github.com/bripkens/lucene/blob/master/CHANGELOG.md#200
Thanks very much! I love what you're doing with this project.
@bripkens I do not understand if the release 2.0.0 supports the Apache Lucene specification: https://lucene.apache.org/core/2_9_4/queryparsersyntax.html
In particular, the Escaping Special Characters section: Lucene supports escaping special characters that are part of the query syntax. The current list special characters are
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \
To escape these character use the \ before the character. For example to search for (1+1):2 use the query:
\(1\+1\)\:2
@raicast: If this is a bug report, then please open a separate ticket.
This being said:
> require('lucene').parse('\\(1\\+1\\)\\:2')
{ left:
{ field: '<implicit>',
fieldLocation: null,
term: '\\(1\\+1\\)\\:2',
quoted: false,
termLocation: { start: [Object], end: [Object] },
similarity: null,
boost: null,
prefix: null } }
This involves a small tweak to the grammar such that certain backslash-prepended characters get added as-is into strings in the AST, rather than parsed and turned into properties.
Since this changes how the AST internally stores its strings, it's possible that users who are parsing strings directly out of the AST may experience breaking changes, but it is very very unlikely. This should be backwards compatible in all common use cases, so I've only bumped the minor version number.
I've added some new unit tests, and old unit tests are almost entirely intact, except in one case where I had to compare against an escaped character.