bripkens / lucene

Node.js lib to transform: lucene query → syntax tree → lucene query
MIT License
72 stars 33 forks source link

`/` in the query results in parse error #20

Closed okonet closed 5 years ago

okonet commented 5 years ago

https://runkit.com/embed/j4erp5p1jqly

var lucene = require("lucene")
lucene.parse('field:test/')

results in

peg$SyntaxError: Expected "!", "&&", "(", "+", "-", ".", "AND NOT", "AND", "NOT", "OR NOT", "OR", "[", "\"", "\\", "^", "{", "||", "~", [^: \t\r\n\x0C{}()"/\^~[\]], end of input, or whitespace but "/" found.

but according to https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Escaping%20Special%20Characters / should not be escaped.

Not sure if this is a bug in the parser or is this invalid lucene syntax?

bripkens commented 5 years ago

This is actually invalid lucene syntax @okonet. This is what happens when you do this with the official Java client:

Query query = new QueryParser("<implicit>", new CustomAnalyzer()).parse("field:test/");
System.out.println(query);
Exception in thread "main" org.apache.lucene.queryparser.classic.ParseException: Cannot parse 'field:test/': Lexical error at line 1, column 12.  Encountered: <EOF> after : ""
    at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:122)
    at QueryParserExperiments.main(QueryParserExperiments.java:17)
Caused by: org.apache.lucene.queryparser.classic.TokenMgrError: Lexical error at line 1, column 12.  Encountered: <EOF> after : ""
    at org.apache.lucene.queryparser.classic.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1130)
    at org.apache.lucene.queryparser.classic.QueryParser.jj_ntk(QueryParser.java:628)
    at org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:321)
    at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:247)
    at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:171)
    at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:160)
    at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:117)
    ... 1 more
okonet commented 5 years ago

Yep that’s why I’m wondering why is / not being encoded then? I should be able to search for such query, right?

bripkens commented 5 years ago

Escaping the / makes it work:

screen shot 2018-10-26 at 09 26 22
okonet commented 5 years ago

Yes, I know that. I’m just wondering if it should be added to escape function?

bripkens commented 5 years ago

This is actually already part of the escape function:

> require('lucene').term.escape('test/')
'test\\/'
okonet commented 5 years ago

Oh nice. I totally missed it! Was looking at the source code but probably was too tired. Thanks for quick response!