bripkens / lucene

Node.js lib to transform: lucene query → syntax tree → lucene query
MIT License
73 stars 33 forks source link

escaping tilde #11

Closed jakecadams closed 6 years ago

jakecadams commented 6 years ago

https://runkit.com/embed/kx7k2fbprecw

> lucene.parse('foo~bar:"hello"')
> {
  "left": {
    "boost": null,
    "field": "<implicit>",
    "prefix": null,
    "quoted": false,
    "similarity": 0.5,
    "term": "foo"
  },
  "operator": "<implicit>",
  "right": {
    "boost": null,
    "field": "bar",
    "prefix": null,
    "proximity": null,
    "quoted": true,
    "term": "hello"
  }
}

I'm having issues escaping the tilde on the field. It seems to work for some other special chars. Any suggestions here?

bripkens commented 6 years ago

Most likely an issue with the grammar. The grammar is far from correct. Feel free to contribute a fix :)

bripkens commented 6 years ago

Looked into this again, the query you defined is not valid according to official lucene query parser (the one written in Java):

Exception in thread "main" org.apache.lucene.queryparser.classic.ParseException: Cannot parse 'foo~bar:"hello"': Encountered " ":" ": "" at line 1, column 7.
Was expecting one of:
    <EOF> 
    <AND> ...
    <OR> ...
    <NOT> ...
    "+" ...
    "-" ...
    <BAREOPER> ...
    "(" ...
    "*" ...
    "^" ...
    <QUOTED> ...
    <TERM> ...
    <PREFIXTERM> ...
    <WILDTERM> ...
    <REGEXPTERM> ...
    "[" ...
    "{" ...
    <NUMBER> ...

What would be valid, is an escaped tilde in the keyword, i.e. foo\~bar:"hello"

bripkens commented 6 years ago

Which unfortunately is also misunderstood by this library :/

> require('.').parse('foo\~bar:"hello"')
{ left:
   { field: '<implicit>',
     term: 'foo',
     quoted: false,
     similarity: 0.5,
     boost: null,
     prefix: null },
  operator: '<implicit>',
  right:
   { field: 'bar',
     term: 'hello',
     quoted: true,
     proximity: null,
     boost: null,
     prefix: null } }
> require('.').parse('foo\\~bar:"hello"')
{ left:
   { field: '<implicit>',
     term: 'foo\\',
     quoted: false,
     similarity: 0.5,
     boost: null,
     prefix: null },
  operator: '<implicit>',
  right:
   { field: 'bar',
     term: 'hello',
     quoted: true,
     proximity: null,
     boost: null,
     prefix: null } }
>
bripkens commented 6 years ago

Escaping strategy has been changed with release 2.0.0 such that this issue no longer applies. In addition, this module now provides helpers for easy escaping/unescaping.