apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.58k stars 1.01k forks source link

lucene expressions module [LUCENE-5207] #6271

Closed asfimport closed 11 years ago

asfimport commented 11 years ago

Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries.

Some additional notes:


Migrated from LUCENE-5207 by Ryan Ernst (@rjernst), 1 vote, resolved Sep 15 2013 Attachments: LUCENE-5207.patch (versions: 3) Linked issues:

asfimport commented 11 years ago

Ryan Ernst (@rjernst) (migrated from JIRA)

First patch. It still needs more documentation and tests, but the core APIs are there.

Note there is generated code in the patch, so it is a little bloated.

asfimport commented 11 years ago

Ryan Ernst (@rjernst) (migrated from JIRA)

Here are some examples of how the API works:

// compile an expression:
Expression expr = JavascriptCompiler.compile("sqrt(_score) + ln(popularity)");

// SimpleBindings just maps variables to SortField instances
SimpleBindings bindings = new SimpleBindings();    
bindings.add(new SortField("_score", SortField.Type.SCORE));
bindings.add(new SortField("popularity", SortField.Type.INT));

// create a sort field and sort by it (reverse order)
Sort sort = new Sort(expr.getSortField(bindings, true));
Query query = new TermQuery(new Term("body", "contents"));
searcher.search(query, null, 10, sort);
asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Wow, this sounds awesome!

So, you can use an arbitrary JavaScript expression to combine DV fields and score into a new dynamic field for sorting?

E.g., a blended relevance + recency sort (which I do w/ a custom comparator now on http://jirasearch.mikemccandless.com).

asfimport commented 11 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi, very cool!

I like the elegant parser!

Its also the correct way to do the dynamic classes: 1 class per classloader, so it can be unloaded easily. The JVM does the same for reflection speedups (it creates stubs for Method#invoke calls, one stub per classloader). The "Loader" class is already "policeman-certified" and used by my code, too.

I am not happy about the various raw "signatures" in binary form. I would rewrite JavascriptFunction to take a java.lang.reflect.Method objects and use http://asm.ow2.org/asm40/javadoc/user/org/objectweb/asm/commons/Method.html#getMethod(java.lang.reflect.Method) or http://asm.ow2.org/asm40/javadoc/user/org/objectweb/asm/Type.html#getMethodDescriptor(java.lang.reflect.Method) to get the signature and argument count. The other hardcoded "L/foo/bar" stuff should be rewritten to use http://asm.ow2.org/asm40/javadoc/user/org/objectweb/asm/Type.html#getDescriptor(java.lang.Class). I can help with that. By using real reflected types we have the safety that all methods are syntactically correct and checked at runtime/compile time.

I think we should create a heavy committing branch. I would help, this is too cool.

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522767 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522767

LUCENE-5207: initial patch from ryan

asfimport commented 11 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I am not happy about the various raw "signatures" in binary form.

+1, I am not either, lets fix that. Otherwise we rely completely on tests to ensure e.g. every function is working and so on.

I think we should create a heavy committing branch.

I created a branch here: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene5207

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522768 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522768

LUCENE-5207: setup svn:ignores and so on

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522769 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522769

LUCENE-5207: eol-style native

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522770 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522770

LUCENE-5207: replaceregexp tabs with spaces in generated parser

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522771 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522771

LUCENE-5207: only depends on antlr-runtime

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522772 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522772

LUCENE-5207: fix broken javadoc link

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522781 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522781

LUCENE-5207: minor docs and API cleanups

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522783 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522783

LUCENE-5207: sync up syntax docs

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522788 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522788

LUCENE-5207: maven/idea config

asfimport commented 11 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Thanks Robert, I will refactor the JavaScriptFunctions class as a first step!

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522798 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522798

LUCENE-5207: simplify tests

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522805 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522805

LUCENE-5207: Cleanup JavascriptFunction class to use reflection and type-safe checking (to be extended)

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522813 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522813

LUCENE-5207: More strict checks for invalid method signatures (currently only double is accepted as parameter or return type). The method must be static, too.

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522822 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522822

LUCENE-5207: Add missing test for atan2 including funny values (+/-0). The original patch had a bug with this method, because the arity was wrong.

asfimport commented 11 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Currently the JavascriptFunction class is very inflexible and hard to extend with custom new methods (maybe needed for Solr).

I propose to make the javascript name ("call") and the mapped Java method signature ("method") be a resource file. We can use the forbidden-apis signature parser to lookup the methods.

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522839 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522839

LUCENE-5207: Simplifications

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522850 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522850

LUCENE-5207: Remove the crazy internal signature notation and use ASM Type to generate them

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522858 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522858

LUCENE-5207: Remove classloader constructor, because it makes no sense to use any other classloader

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522873 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522873

LUCENE-5207: Create the class name of generated classes from the parsed text

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522877 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522877

LUCENE-5207: Limit the maximum class name length

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522888 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522888

LUCENE-5207: Remove stupidity... :(

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522907 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522907

LUCENE-5207: Revert the dynamic class name. Its much better to use the "source file attribute". The class name is now constant (as every class gets own class loader) and looks like an internal class of the compiler. The stack trace is then looking like: Throwable #1: java.lang.IllegalArgumentException: foobar at __randomizedtesting.SeedInfo.seed([3968E8DD2901F71C:4292B9595A397818]:0) at org.apache.lucene.util.MathUtil.log(MathUtil.java:51) at org.apache.lucene.expressions.js.JavascriptCompiler$CompiledExpression.evaluate(logn(2, 0)) at org.apache.lucene.expressions.js.TestJavascriptFunction.assertEvaluatesTo(TestJavascriptFunction.java:27) at org.apache.lucene.expressions.js.TestJavascriptFunction.testLognMethod(TestJavascriptFunction.java:178) at java.lang.Thread.run(Thread.java:724)

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522925 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522925

LUCENE-5207: Add a unused test method to make sure that if we change the FunctionValues interface we get compile error. Also make the class format version a constant for easy maintenance (once we backport)

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522967 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522967

LUCENE-5207: Remove classloader field (is not needed, we call only once)

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1522972 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1522972

LUCENE-5207: Refactor compiler to use final fields and simplify initialization

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1523016 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1523016

LUCENE-5207: Minor cleanups, also mark all generated methods as SYNTHETIC because there exists no source code

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1523042 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1523042

LUCENE-5207: Update to antlr 3.5 (which produces no warnings while compiling with java 7). Also fix the regen-macro to handle windows file paths while replacing

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1523046 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1523046

LUCENE-5207: replace tabs by 2 spaces now. antlr 3.5 produces tabs consistently now, so we can replace them (no mixed tabs anymore)

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1523047 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1523047

LUCENE-5207: upgrade checksum/maven

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1523057 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1523057

LUCENE-5207: try a hack around antlr hashmap bugs

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1523059 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1523059

LUCENE-5207: enforce encoding and locale (for paranoia reasons)

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1523075 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1523075

LUCENE-5207: add comment that the regen hack does not work in Java 8

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1523114 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1523114

LUCENE-5207: load available javascript functions from resource file (properties)

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1523204 from @uschindler in branch 'dev/branches/lucene5207' https://svn.apache.org/r1523204

LUCENE-5207: Throw correct exception in JavascriptFunction ctor

asfimport commented 11 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I think the branch is now in a quite good state.

I thought a little bit about extensibility of Javascript functions: The current code is more or less "hardcoded" (although its configureable by a resource file in the JAR). But this is not extensible by users (all private, you have to rebuild JAR).

My idea would be to extend Bindings to also allow to get "methods". So one could register a java.lang.reflect.Method binding. The code would check that it only accepts "double" as parameters and returns a "double". Solr could then use this to register stuff like "haversine". The compiler would be extended to first try to get a binding for the function name (if it is a Method instance) otherwise it falls back to the builtins.

Another thing to discuss (which is not yet a problem): If we allow function binding and e.g. a Solr contrib from SolrResourceLoader would register a function, the generated bytecode would fail to find the function: This is because we use the classloader of the JavaScript module not the one of the caller. We should either make this configureable or (I like this better): If somebody registers a custom function we should include this into the classloader, so the custom function impl class can be found by the JVM:

If we make the classloader configureable (like it was in the original patch), we must also check that our own classloader is a parent of the given one.

asfimport commented 11 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Uwe: can we defer that function stuff to another issue?

The whole discussion about classloader stuff doesnt make a lot of sense to me. Today Functions are implemented with just normal invokeVirtual. Bindings aren't used until you actually run the query... Today the way an expression interacts with the Bindings is not with invokeVirtual, its just another valuesource coming in to its array at runtime.

So I dont think custom functions should go thru Bindings, its totally unrelated to how that class works at all. I also don't want to make the API very confusing just because solr does crazy things with classloaders quite yet: I think this is very close to usable for lucene users as-is.

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1523279 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1523279

LUCENE-5207: add resources folder here

asfimport commented 11 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I thought a little bit about extensibility of Javascript functions: The current code is more or less "hardcoded" (although its configureable by a resource file in the JAR). But this is not extensible by users (all private, you have to rebuild JAR).

Thats not exactly true. Someone can make plug in their own compiler that compiles String->Expression in some other way.

If we want to make this particular one more extensible, then we should give it ctors or protected methods to do that.

This is easy to do and does not involve classloader hell. Doing functions through Bindings would be both slow and wrong.

asfimport commented 11 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi Robert, sorry for beeing a little bit confusing: The two issues Classloader and Functions extensibility don't really have something to do, they are mostly separate. With the current code and private ctors and static functions, there is indeed no need for a custom classloader. This is why I argued to remove it.

The classloader also does not have anything to do with Bindings. The bindings was just an idea how to make it "easy" for the user to register own functions in a non-static way. I was under the impression, that the bindings are availabe while compiling. I don't want it "dynamic" - I wanted it still statically compiled. The idea was to add the reflected Method to the bindings, so when compiling the bytecode the method can be used (not per call). After reviewing the code again, I can see that the bindings are not available at the time of compiling - that was the misunderstanding, sorry (but it could still be fixed to support this). It just looked elegant to treat a function like a binding.

The current patch is fine, let's commit it and do all other stuff like extending custom functions later. For that we have to make the whole JavascriptFunction interface public and non-static (currently its a singleton-like factory). But this can be done in another issue.

The classloader problem is something that comes into the game when users are able to register own functions. The problem here is that the classloader used to load the ASM-generated class has the one of lucene-core as parent (because we use the compiler's own classloader: this.getClass().getClassloader()). This classloader does not necessarily have access to anything in a different classloader (e.g. a plugin in ElasticSearch or Solr).

Finally its simple: Once we allow foreign, user-defined functions, we must add the Classloader argument to the compiler again, otherwise you cannot register methods from classes of foreign classloaders. An alternative would be to let the compiler figure out himself by passing a list of java.lang.reflect.Method and it chooses the correct classloader automatically.

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1523286 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1523286

LUCENE-5207: make functions pluggable

asfimport commented 11 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Hi Uwe: please see my commit (we can revert it, if we have to, but i think its an easy step).

The idea is: JavascriptFunction as a class provides nothing really, it is nothing more than a Method with some extra checks in its ctor.

So i dont think this class need exist at all, and the default list is nothing but a unmodifiable Map<String,Method>.

So in addition to:

public static Expression compile(String sourceText);

I added:

  public static Expression compile(String sourceText, Map<String,Method> functions) throws ParseException {
    for (Method m : functions.values()) {
      checkFunction(m);
    }
    return new JavascriptCompiler(sourceText, functions).compileExpression();
  }

I will add some tests, but this is very simple and allows someone to expose whatever methods they want, not just add but also remove :)

asfimport commented 11 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Finally its simple: Once we allow foreign, user-defined functions, we must add the Classloader argument to the compiler again, otherwise you cannot register methods from classes of foreign classloaders. An alternative would be to let the compiler figure out himself by passing a list of java.lang.reflect.Method and it chooses the correct classloader automatically.

I think we can add this to the second method signature?

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1523296 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1523296

LUCENE-5207: add some simple tests for custom functions

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1523297 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1523297

LUCENE-5207: allow specifying classloader when using custom functions

asfimport commented 11 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1523300 from @rmuir in branch 'dev/branches/lucene5207' https://svn.apache.org/r1523300

LUCENE-5207: add some checks and tests for illegal stuff