flaxsearch / luwak

A java library for stored queries
Other
374 stars 82 forks source link

How to do numeric comparison #139

Open govindpatel opened 7 years ago

govindpatel commented 7 years ago

Hi, I wanted to do numeric comparison like greater than less than etc. So I wrote this,

// this is my document
String titleString = "THis is the text first to be indexed";
String documentString = "THis is my first document. " +
                "Have to test this text to my best and also check for all the use case where " +
                "and or not etc can come in between and nested boolean\n expression is also something I have to check";
InputDocument document = InputDocument.builder("doc1")
                .addField("title", titleString, new StandardAnalyzer())
                .addField("document", documentString, new StandardAnalyzer())
                .addField(new LongField("upvote", 1L, Field.Store.YES)) // I also tried this `.addField("upvote", "1", new StandardAnalyzer())`
                .build();
Monitor monitor = new Monitor(new LuceneQueryParser(null), new TermFilteredPresearcher());
// title should contain `text`
MonitorQuery monitorQuery0 = new MonitorQuery("query0", "title:text");
// upvote >= 10
MonitorQuery monitorQuery1 = new MonitorQuery("query1", "upvote:[10 TO *]");
// same upvote >= 10
MonitorQuery monitorQuery2 = new MonitorQuery("query2", NumericRangeQuery.newLongRange("upvote", 10L, Long.MAX_VALUE, true, true).toString());

ArrayList<MonitorQuery> monitorQuery = new ArrayList<MonitorQuery>();
monitorQuery.add(monitorQuery0);
monitorQuery.add(monitorQuery1);
monitorQuery.add(monitorQuery2);
List<QueryError> errors = monitor.update(monitorQuery);

// match queries to document
Matches<QueryMatch> matches = monitor.match(document, SimpleMatcher.FACTORY);
// print the matches
for (String s : matches.getPresearcherHits()) {
       System.out.println("QueryMatched : " + s);
}
monitor.close();

the above program gives the following result:

QueryMatched : query3
QueryMatched : query0
QueryMatched : query1

the query should not have matched the "query1" and "query2" because I wanted to match only if the document have upvote >= 10.

I have a bunch of things I wanted to try with luwak:

  1. Numeric range query (like greater than, less than, equals, etc)
  2. Text query (like contains, equals, startsWith, endsWith)
  3. Is it possible to combine and, or and not?

I did tried my best didn't find anything.. So creating a issue, Please help. Thanks,

romseygeek commented 7 years ago

Luwak just provides the default LuceneQueryParser implementation for convenience, but for more complicated queries you're going to need to write your own parser. In particular, the default lucene query parser doesn't handle Point queries or Span queries. And/Or/Not you can do with simple boolean google-like operators of + and -

govindpatel commented 7 years ago

@romseygeek you said, And/Or/Not you can do with simple boolean google-like operators of + and -

but for more complicated queries you're going to need to write your own parser

Can you elaborate a little on those. Thanks.

romseygeek commented 7 years ago

Those docs are pretty out-of-date, you want to be looking at https://lucene.apache.org/core/6_5_0/queryparser/index.html instead, I think.

In particular, for Point fields, I think what you want to do is extend QueryParserBase and override getRangeQuery(). The reason that the default query parser doesn't handle point fields is that lucene is schemaless, so the parser doesn't know whether to generate term range queries, long range queries, point range queries, etc.

You might be better asking this sort of question directly on the lucene mailing list.

mithranalandur commented 6 years ago

Hi I tried overriding getRangeQuery() but still having the same issue.

I tried debugging and i am seeing response as [query1,query3] instead of [query1]

The IndexSearcher query is "title:test1 anytokenfield:ANYTOKEN__" used inside the monitor query. i.e., QueryIndex.java line 110. instead of "title:test1 upvote:[12 TO 2147483647]"

I got [query1, query3] as response the complete code is at https://gist.github.com/mithranalandur/ce170ce287eca2a8f2cfaab6342068df

Can you give me some pointer like where am i going wrong.