Simn / genjvm

13 stars 1 forks source link

[boss battle] invokedynamic #12

Closed Simn closed 5 years ago

Simn commented 5 years ago

We really have to understand invokedynamic. The idea would be to use that whenever we call something but aren't sure what we're calling.

My concern is that this is not enough to cover our semantics. For every unknown.call() there's also (unknown.call)(). However, maybe these are just cascading problems: The dynamic field access would return "something", and then the invokedynamic would figure out how to call it.

Relevant documentation:

What I understood so far is that this is a two-step process:

  1. When the JVM finds invokedynamic while jitting, it calls a bootstrap method which returns something (I think CallSite?) that is associated with the instruction.
  2. At run-time, that returned something is called, somehow.
Simn commented 5 years ago

Let's see...

invokedynamic itself is just:

invokedynamic
indexbyte1
indexbyte2

First, the unsigned indexbyte1 and indexbyte2 are used to construct an index into the run-time constant pool of the current class (§2.6), where the value of the index is (indexbyte1 << 8) | indexbyte2. The run-time constant pool item at that index must be a symbolic reference to a call site specifier (§5.1). The values of the third and fourth operand bytes must always be zero.

This gives us the CONSTANT_InvokeDynamic_info structure:

CONSTANT_InvokeDynamic_info {
    u1 tag;
    u2 bootstrap_method_attr_index;
    u2 name_and_type_index;
}

With bootstrap_method_attr_index we can look up this structure from the BootstrapMethods attribute of the class:

{
        u2 bootstrap_method_ref;
        u2 num_bootstrap_arguments;
        u2 bootstrap_arguments[num_bootstrap_arguments];
}

For each element in bootstrap_arguments we grab something from the constant pool:

The constant_pool entry at that index must be a CONSTANT_String_info, CONSTANT_Class_info, CONSTANT_Integer_info, CONSTANT_Long_info, CONSTANT_Float_info, CONSTANT_Double_info, CONSTANT_MethodHandle_info, or CONSTANT_MethodType_info structure

The bootstrap_method_ref indexes back into the constant pool to get us a CONSTANT_MethodHandle_info structure:

CONSTANT_MethodHandle_info {
    u1 tag;
    u1 reference_kind;
    u2 reference_index;
}

The reference_kind here is supposed to be REF_invokeStatic (6), which means the reference_index finds us this from the constant pool:

CONSTANT_Methodref_info {
    u1 tag;
    u2 class_index;
    u2 name_and_type_index;
}

Then the next part of the invokedynamic documentation is:

The call site specifier is resolved (§5.4.3.6) for this specific dynamic call site to obtain a reference to a java.lang.invoke.MethodHandle instance, a reference to a java.lang.invoke.MethodType instance, and references to static arguments.

I understand that we have the MethodHandle and the static arguments, but I'm not sure where the MethodType comes from here.

At this point it finally calls something:

Next, as part of the continuing resolution of the call site specifier, the bootstrap method is invoked as if by execution of an invokevirtual instruction (§invokevirtual) that contains a run-time constant pool index to a symbolic reference to a method (§5.1) with the following properties:

It goes on to describe that the function signature must be this (in Haxe):

static function invoke(handle:MethodHandle, lookup:MethodHandles.Lookup, name:String, type:MethodType, rest:Rest<Dynamic>):CallSite

I need some coffee before I continue this... Does anyone know a JVM expert?

nadako commented 5 years ago

Does anyone know a JVM expert?

Should I ask around? :)

Simn commented 5 years ago

I mean, I'll figure it out eventually, but I wouldn't mind some advice on this topic for our specific use-case. I "think" we only need a single invoke function which does all the dirty work and returns a CallSite object. But I still don't really understand what I'm supposed to do with that CallSite object.

nadako commented 5 years ago

I found a 2 hour presentation specifically about invokedynamic, but it's in Russian, lol :) https://www.youtube.com/watch?v=DgshYDTpS9I

I'm going to watch it if I'll have any chance today and maybe finally get some idea about all this stuff.

Simn commented 5 years ago

I just found out about LambdaMetafactory (lmao) which is the thing you're supposed to use for lambda functions with invokedynamic.

nadako commented 5 years ago

I thought those are for adapting lamdas to single method interfaces...

Simn commented 5 years ago

From the doc:

Typically used as a bootstrap method for invokedynamic call sites, to support the lambda expression and method reference expression features of the Java Programming Language.

That sounds exactly like what we want.

Simn commented 5 years ago

... then again, it wants a MethodHandle as one of its arguments, so maybe it's not what we want after all.

Simn commented 5 years ago

The more I look into this the more I wonder if this is useful for us at all. All the usage examples I find rely on the caller lookup to find the field, but that doesn't help in a situation like (obj : Dynamic).call(). We have to look up call on obj, but the bootstrap method cannot do that because it doesn't know wtf obj is.

It looks like we would have to return a MutableCallSite with the method type (which we do know) and then call setTarget on it after resolving the MethodHandle at run-time. But at that point I start to wonder: why? If we have to dynamically resolve the MethodHandle anyway, then why should we bind it to some call site instead of just calling invoke on it?

I must still be missing something here.

Simn commented 5 years ago

I watched this presentation which mentions the exact use-case:

chrome_2019-03-28_08-38-22

He casually mentions how the bootstrap method looks up "foo" in that method table. That makes no sense to me. Sure, you could construct such a method table and discriminate against name and argument types + return type, but that would just be a filter and can easily give ambiguous results.

Maybe the idea is to apply that filter and then check how many results we get? If it's 1, we can use ConstantCallSite, otherwise we use MutableCallSite. In the latter case we would have to inject code which makes sure we end up calling the right thing. Which likely means going very reflective.

This can't be right, right?

Simn commented 5 years ago

I have added a demo commit to show that I got the attributes and constant pool handling right. It supports something like this:

import jvm.Jvm;

class Main {
    static public function main() {
        Sys.println(Jvm.invokedynamic("test", [12, "foo"]));
    }

    @:keep
    static function test(v:Int, s:String):Dynamic {
        return s + ": " + v;
    }
}

So we can statically resolve the unknown method "test".

However, at this point I'm all but convinced that this instruction doesn't actually help us. I'll thus stop wasting my time on it and focus on solving actual problems. If we find a use-case for it, we should now have the tools to emit the instruction accordingly.

Simn commented 5 years ago

I realized that this demo was quite easy to generalize by allowing to specify the bootstrap method and static arguments. So we can now do this from Haxe:

import jvm.Jvm;
import java.lang.invoke.*;

class Main {
    static public function main() {
        Sys.println((Jvm.invokedynamic(Jvm.bootstrap, "test", [], "StringContext", 12) : String));
        Sys.println((Jvm.invokedynamic(Main.bootstrap, "test", [createStringContext], 12) : String));
    }

    static public function bootstrap(caller:MethodHandles.MethodHandles_Lookup, name:String, type:MethodType, extraArgResolver:MethodHandle):CallSite {
        var extraArg:java.lang.Object = extraArgResolver.invokeWithArguments(@:privateAccess [].__a);
        type = type.insertParameterTypes(0, @:privateAccess [extraArg.getClass()].__a);
        var handle = caller.findStatic(caller.lookupClass(), name, type);
        return new ConstantCallSite(handle.bindTo(extraArg));
    }

    static function createStringContext() {
        return "CreatedStringContext";
    }

    static function test(context:String, argument:Int){
        return 'context: $context, argument: $argument';
    }
}

I... still don't think this is useful for us, but at least the architecture is in place. On a technical level, it's quite nice that we can pass a reference to createStringContext to the bootstrap method and it is then called to obtain a value which is bound to the test function.

This stuff is probably mega cool for languages like Ruby where a call to test would be essentially the same as in my example. However, for us it seems like nothing more than a technical curiosity.