JuliaInterop / JavaCall.jl

Call Java from Julia
http://juliainterop.github.io/JavaCall.jl
Other
118 stars 53 forks source link

Proposal: jdcall and jtypeforclass #93

Open clouds56 opened 5 years ago

clouds56 commented 5 years ago
  1. Now the get class for name is compatible with Array and primitive types

    https://github.com/JuliaInterop/JavaCall.jl/blob/d6c3678d3a93fae903fb3ab1cb1ea7ea1fc91f92/src/convert.jl#L321-L325 https://github.com/JuliaInterop/JavaCall.jl/blob/d6c3678d3a93fae903fb3ab1cb1ea7ea1fc91f92/src/reflect.jl#L163-L168 This method may help

    function jtypeforclass(cls::JClass)
    isarray(cls) = jcall(cls, "isArray", jboolean, ()) != 0x00
    if isarray(cls)
        jcomponentcls = jcall(cls, "getComponentType", JClass, ())
        return Array{jtypeforclass(jcomponentcls), 1}
    end
    name = getname(cls)
    if name == "void"
        Nothing
    elseif name == "boolean"
        jboolean
    elseif name == "char"
        jchar
    elseif name == "short"
        jshort
    elseif name == "float"
        jfloat
    elseif name == "double"
        jdouble
    elseif name == "int"
        jint
    elseif name == "long"
        jlong
    else
        JavaObject{Symbol(name)}
    end
    end

and the narrow would be

function narrow(obj::JavaObject)
    c = jcall(obj,"getClass", JClass, ())
    return convert(jtypeforclass(c), obj)
end
narrow(obj::JavaCall.jprimitive) = obj
  1. And it would be possible to call functions without specific return type and argument type
convertible(javatype::Type, juliatype::Type) = hasmethod(convert, Tuple{Type{javatype}, juliatype})

function findmethod(obj::Union{JavaObject{C}, Type{JavaObject{C}}}, name::AbstractString, args...) where C
    allmethods = listmethods(obj, name)
    filter(allmethods) do m
        params = getparametertypes(m)
        if length(params) != length(args)
            return false
        end
        all([convertible(jtypeforclass(c), typeof(a)) for (c, a) in zip(getparametertypes(m), args)])
    end
end

function jdcall(obj::Union{JavaObject{C}, Type{JavaObject{C}}}, name::AbstractString, args...) where C
    matchmethods = findmethod(obj, name, args...)
    if length(matchmethods) == 0
        allmethods = listmethods(obj, name)
        candidates = join(allmethods, "\n  ")
        error("no match methods $name for $obj, candidates are:\n  $candidates")
    elseif length(matchmethods) > 1
        candidates = join(matchmethods, "\n  ")
        error("multiple methods $name for $obj, candidates are:\n  $candidates")
    end
    matchmethod = matchmethods[1]
    rettype = jtypeforclass(getreturntype(matchmethod))
    argstype = tuple(map(jtypeforclass, getparametertypes(matchmethod))...)
    # println("type: $rettype ($argstype)")
    return jcall(obj, name, rettype, argstype, args...)
end

Since narrow and classforname are both exported, so this may be a breaking change, although I think the behavior of the two methods now is strange when it comes to array and/or jint.

dfdx commented 5 years ago

JVM internals aren't that elegant, sometimes you really, really want to call a very specific method. For example, in Spark.jl I once encountered two methods of the same object with the same list of parameter types, but with different return types (which was a result of tricky inheritance and/or Java/Scala compiler differences). Anyway, the only way to call exact method I wanted was to pass exact list of types to JNI, so it's unlikely we will be able to replace jcall.

Are you concerned with the need to specify argument types? If so, have a look at #91 which provides a more high-level API with Java-like syntax and automatic jcall generation (implemented similar to what you propose here). For example, when it's done, you'd able to write something like:

a = JProxy(@jimport(java.util.ArrayList)(()))
a.add("hello")
a.add(1)
a.get(0) == "hello"
a[1] == "hello"
a[2] == 1

And at the same time it will preserve low-level API of jcall. Does it fit your needs?

clouds56 commented 5 years ago
  1. No, we could not override only by return type in Java. I thought you mean convertible parameter types like

    • void org.apache.spark.sql.Dataset.show(int)
    • void org.apache.spark.sql.Dataset.show(boolean) they have different parameter list while in JavaCall.jl, you have to specific parameter type
      jcall(dataset, "show", Nothing, (jint,), 10)
      jcall(dataset, "show", Nothing, (jboolean,), true)

      ANSWER: yes, if we meet case like this (multiple match found), the jdcall would not choose the method for you, you have to use the jcall explicitly.

  2. by adding jdcall, we could achieve chain call like this

    @spark df.groupBy(["match_type"]...).agg([
    @col(lit(int_obj(1)).count().alias("count")),
    @col(won.when(bool_obj(true)).count().alias("won"))]...).sort([@col won.desc()]).toJSON().take(200)

    Notice here we do not have a line @jimport org.apache.spark.sql.RelationalGroupedDataset (and have not used any methods in Spark.jl), we do not have to know the intermediate type as what we could do in scala (see the gist for more information)

  3. and for JProxy, I think it is a wonderful API but not the same direction with jdcall which I proposed here. (I'm not familiar with JProxy, correct me if I'm wrong)

    • JProxy would automatically wrap all methods using genMethods, which may take some time compiling for complex class.
    • JProxy would also encounter the problem when multiple methods in java matches (I don't know how it handle the case), and even worse, it may have conflict in julia side, may not?
    • jdcall would be faster and has much more small memory footprint if only one method of the class would be called.
    • the code of jdcall is clean, it is just a variant/extension of jcall
  4. I think we could keep the both, even after #91 merged, we would never remove jcall api.

    • developer could use JProxy in binding package while use jdcall/jcall in prototype.
clouds56 commented 5 years ago

TL; DR JProxy is heavy while jdcall is light. JProxy provides a completely new way to interact with java object jdcall is just a improvement to jcall (and is still low-level API)

I would love JProxy while I still propose jdcall

dfdx commented 5 years ago

by adding jdcall, we could achieve chain call like this

First of all, this looks really great! Having said that, I still think you and @zot work on a very similar approaches and your Spark example could be a good demo for JProxy as well. What I'm concerned with is that we may end up in multiple intersecting APIs which would be confusing to occasional users.

I like simplicity and transparency of jdcall, but it has one important drawback - you call findmethod and other stuff every time before actual jcall. For Spark it's likely to be a little fraction of work on JVM side, but for other projects it might be a huge overhead.

To fix it, one could add a method cache and bind it to an object representing loaded class. At this point it starts looking similar to JProxy approach (though Bill may add more context - I only roughly track implementation changes).

I hope this doesn't discourage you - I really enjoy the described future, just want to keep implementation sound and robust so we could maintain it for many years without breaking things.

dfdx commented 5 years ago

By the way, regarding method signatures with different return types:

// A.java
class A {

    public Object foo() {
    return true;
    }
}

// B.java
class B extends A {

    public Integer foo() {
    return 42;
    }
}
julia> using JavaCall

julia> JavaCall.init()

julia> JB = @jimport B
JavaObject{:B}

julia> listmethods(JB)
11-element Array{JavaObject{Symbol("java.lang.reflect.Method")},1}:
 java.lang.Integer foo()         
 java.lang.Object foo()          
...

There are now 2 methods with the same signature except for return type, so your findmethod will raise an error. But in general it still works fine - calling a version with JObject return type will invoke correct version from class B, so it's not really funny.

Let's make it a bit more interesting and redefine A.java as:

class A {

    public boolean foo() {
    return true;
    }
}

Recompile A.java, but not B.java. This may happen e.g. when combing in runtime different versions of JARs.

julia> using JavaCall

julia> JavaCall.init()

julia> JB = @jimport B
JavaObject{:B}

julia> listmethods(JB)
12-element Array{JavaObject{Symbol("java.lang.reflect.Method")},1}:
 java.lang.Integer foo()         
 java.lang.Object foo()
...

JB still lists 2 method it saw during compilation. Let's call them:

julia> JInteger = @jimport java.lang.Integer;

julia> r = jcall(b, "foo", JInteger, ())
JavaObject{Symbol("java.lang.Integer")}(Ptr{Nothing} @0x0000000002733288)

julia> convert(jint, r)
42

So far so good.

julia> r = jcall(b, "foo", JObject, ())
JavaObject{Symbol("java.lang.Object")}(Ptr{Nothing} @0x00000000027332a0)

julia> convert(jint, convert(JInteger, r))
42

Still correct - method from B.

julia> jcall(b, "foo", jboolean, ()) |> Bool
true

WAT? Now we have 2 methods with exactly the same name and argument list, but with even incompatible return types!

You might think it's just a contrived example, but as I mentioned earlier I've encountered something very similar in practice in Spark.jl - I spent a couple of hours figuring out correct signature for jcall when even listmethods didn't show the one I needed.

clouds56 commented 5 years ago

Interesting point. I think the limit would also apply to JProxy since it also uses listmethods to find methods. I wonder how it would handle this. And more than that, what would happen if I call

b=JProxy(@jimport(B)(()))
b.foo() # should it be 42 or true?

For the drawback that calling listmethods every time is expensive, I think we could cache methods we found (not caching the whole method list).

"cached methods here"
function jdcall_cached end

function jdcall_cache_clear!()
    for m in methods(jdcall_cached)
        Base.delete_method(m)
    end
end

function jdcall_cache(obj::Union{JavaObject{C}, Type{JavaObject{C}}}, name::AbstractString, args...) where C
    matchmethods = findmethod(obj, name, args...)
    if length(matchmethods) == 0
        allmethods = listmethods(obj, name)
        candidates = join(allmethods, "\n  ")
        error("no match methods $name for $obj, candidates are:\n  $candidates")
    elseif length(matchmethods) > 1
        candidates = join(matchmethods, "\n  ")
        error("multiple methods $name for $obj, candidates are:\n  $candidates")
    end
    matchmethod = matchmethods[1]
    rettype = jtypeforclass(getreturntype(matchmethod))
    argstype = tuple(map(jtypeforclass, getparametertypes(matchmethod))...)

    # cache the method found
    args_julia = Symbol.(string.("arg", 1:length(args)))
    params_julia = [:($name::$type) for (name, type) in zip(args_julia, typeof.(args))]
    eval(:(function jdcall_cached(base::Union{JavaObject{$(QuoteNode(C))}, Type{JavaObject{$(QuoteNode(C))}}}, ::Val{$(QuoteNode(Symbol(name)))}, $(params_julia...))
      jcall(base, $name, $rettype, $argstype, $(args_julia...))
    end))
    return rettype, argstype
end

function jdcall(obj::Union{JavaObject{C}, Type{JavaObject{C}}}, name::AbstractString, args...) where C
    if !applicable(jdcall_cached, obj, Val(Symbol(name)), args...)
        rettype, argstype = jdcall_cache(obj, name, args...)
        return jcall(obj, name, rettype, argstype, args...)
    end
    @info("using cached $(@which jdcall_cached(obj, Val(Symbol(name)), args...))")
    return jdcall_cached(obj, Val(Symbol(name)), args...)
end

If you use some class heavily, the JProxy should be your choice.

There remains some issues.

clouds56 commented 5 years ago

I realized that the dispatch/findmethod relies on hasmethod(convert, ...), which means the founded method may differ if more convert function defined and thus jdcall_cached goes stale. Even worse, as introducing more convert functions, some worked code would be broken. image that someone wrote a function like this

convert(x, y) = error("Try to convert $(y) to $(x), this should not happen")

(at the same time jdcall could take the advantage of user defined convert to more flexibly dispatch method)

Maybe we should make "julia to JavaObject" convert more robust.

dfdx commented 5 years ago

We may also encounter issues with this piece:

eval(:(function jdcall_cached(base::Union{JavaObject{$(QuoteNode(C))}, Type{JavaObject{$(QuoteNode(C))}}}, ::Val{$(QuoteNode(Symbol(name)))}, $(params_julia...))
      jcall(base, $name, $rettype, $argstype, $(args_julia...))
    end))

if we try to use newly defined method at the same world age as we called eval (see e.g. this discussion). We can overcome it in several ways:

  1. Push method definition as early as possible, e.g. in @jimport. IIRC, JProxy aims to do this.
  2. Use Base.invokelatest to call a method from future worlds. This adds constant overhead though.
  3. Avoid defining new Julia methods and dispatch dynamically. Should be comparable with (2) in performance.
clouds56 commented 5 years ago

There still an option 4 that use global Dict{Tuple{JavaObject{T}, Symbol}, Any} to store cached functions. Not sure if the performance would be better/worse than Base.invokelatest.

I think the issue also applies for JProxy. Consider running

JProxy(@jimport(java.util.ArrayList)(())).getClass().getClassLoader().toString()

when should I make the proxy for java.lang.ClassLoader? If as early as possible, it would run into trouble that we would proxy lots of classes (only if they are dependencies of some method of java.util.ArrayList). If as lazy as when got a return type that is ClassLoader, it would have same trouble as long as JProxy use eval to gen code.

According to the author of JProxy, the gen.jl is not used now. JProxy is now doing dynamic dispatch (just like jdcall) in current PR.

zot commented 5 years ago

Yes, it is doing dynamic dispatch and gen.jl is not currently used but JProxy caches the method ids in JClassInfo structures so that saves a lot of work. Once you make a proxy, the class info is saved so it's around the next time you make one.

JProxy only caches classinfo for its class and the ancestors (classes and interfaces). The classes of the method return types and arguments can cause subtypes of java_lang to be defined but not classinfo data. I.e. the method dictionaries are only populated for the JProxy data's class and its ancestors, not for the return types or arguments of the methods it finds. Otherwise it would end up "faulting in" large parts of the JDK when you referenced just about anything (I think one of my earlier versions did cache classinfo for return types and arguments though).

I'm not sure about the eval and invokelatest issue -- is it possible to defer code to run in a later Julia world? To solve a problem like this in JavaScript, I'd do something like:

function makeDefs(defsFunc, thenFunc) {
    defsFunc();
    setTimeout(thenFunc, 1); // run thenFunc later
}

Maybe yieldto() will work for that? If that works, then you can just defer later code whenever you need to make more definitions. In any case you can just use @class(a.b.c) to create a static proxy for a.b.c and that will do all the caching.

By the way, you can use a static proxy as a constructor, in addition to providing access to static members, like this:

AL = @class(java.util.ArrayList)
arrayList1 = AL()
arrayList1.add(1)
println(AL(arrayList1))
clouds56 commented 5 years ago

I love the idea that use macro to do static dispatch. I'm working with ClassLoader now and found that caching would mess things up. Two classes with the same name/identity may be different class (since different ClassLoader, see here)

zot commented 5 years ago

Thanks! I just expanded the idea of JavaCall's @jimport() macro a little bit there.

@class() is really just syntactic sugar for a call to staticproxy(classname).

I just added staticproxy(JClass) and staticproxy(JProxy{java_lang_Class, false}) so you can easily make static proxies on class objects to help with using different class loaders. Now you can say something like staticproxy(x.getClass().getClassLoader().loadClass(classname)).

This will help because staticproxy(classname) (with only a string argument) uses reflect.jl's classforname() function, which uses the context class loader:

function classforname(name::String)
    thread = jcall(JThread, "currentThread", JThread, ())
    loader = jcall(thread, "getContextClassLoader", JClassLoader, ())
    return jcall(JClass, "forName", JClass, (JString, jboolean, JClassLoader),
                 name, true, loader)
end