bripkens / lucene

Node.js lib to transform: lucene query → syntax tree → lucene query
MIT License
72 stars 33 forks source link

Grouping order for query like "𝑎 AND 𝑏 AND 𝑐" #34

Open laggingreflex opened 4 years ago

laggingreflex commented 4 years ago

It seems the query: "𝑎 AND 𝑏 AND 𝑐" is by default grouped as "𝑎 AND (𝑏 AND 𝑐)".

Would it be unreasonable to expect it be grouped instead as "(𝑎 AND 𝑏) AND 𝑐"?

I'm creating a filter based on this library, and I stumbled upon a particular query that makes me thing that the latter might be more natural.

E.g.: For this data:

const data = [
  { /* 0 */ name: 'C-3PO', species: 'Droid', height: 1.7526, misc: {} },
  { /* 1 */ name: 'R2-D2', species: 'Droid', height: 1.1, misc: {} },
  { /* 2 */ name: 'Anakin Skywalker', species: 'Human', height: 1.9 },
  { /* 3 */ name: 'Obi-Wan Kenobi', species: 'Human', height: 1.8, misc: {} },
  { /* 4 */ name: 'Han Solo', species: 'Human', height: 1.8, misc: {} },
  { /* 5 */ name: 'Princess Leia', species: 'Human', height: 1.5, misc: {} },
];

If I query:

an AND NOT wan AND NOT han

I expect the result to be

{ /* 2 */ name: 'Anakin Skywalker', ... }

right?

But that happens only when the query is specifically formatted as:

(an AND NOT wan) AND NOT han

To elaborate step-by-step:

Case 1: 'an AND NOT wan AND NOT han'

Query split as

{
left: 'an', 
operator: 'AND NOT', 
right: 'wan AND NOT han'
}
  1. Parse left side 'an' = 3 results: [Anakin, Obi-Wan, Han Solo]

  2. Parse right side: 'wan AND NOT han'

    Query split as:

    {
      left: 'wan', 
      operator: 'AND NOT', 
      right: 'han'
    }
    1. Parse left side 'wan' = 1 result: [Obi-Wan]

    2. Parse right side 'han' = 1 result: [Han Solo]

    3. Apply operator AND NOT

        [Obi-Wan] AND NOT [Han Solo] 

      = 1 results: [Obi-Wan]

  3. Apply operator AND NOT

    [Anakin, Obi-Wan, HanSolo] AND NOT [Obi-Wan]

    = 2 results: [Anakin, Han Solo]

End Result: [Anakin, Han Solo]

Case 2: '(an AND NOT wan) AND NOT (han)'

Query split as:

{
  left: 'an AND NOT wan', 
  operator: 'AND NOT', 
  right: 'han'
}
  1. Parse left side 'an AND NOT wan'

    Query split as:

    {
      left: 'an', 
      operator: 'AND NOT', 
      right: 'wan'
    }
    1. Parse left side 'an' => 3 results: [Anakin, Obi-Wan, Han Solo]

    2. Parse right side 'wan' => 1 results: [Obi-Wan]

    3. Apply operator AND NOT

      [Anakin, Obi-Wan, Han Solo] AND NOT [Obi-Wan] 

      = 2 results: [Anakin, Han Solo]

    4. Parse right side 'han' = 1 results: [Han Solo]

    5. Apply operator AND NOT

      [Anakin, Han Solo] AND NOT [Han Solo]

      = 1 results: [Anakin]

End Result: [Anakin]

So, as you can see only Case 2 gives the expected result.

Unless my expectations or algorithm is flawed in which case I'd appreciate the correction.