TObjectDecl - Githubissues

Simn commented 5 years ago

We need a good way to generate anonymous objects.

Simn commented 5 years ago

One thing to consider here is Reflect.deleteField. 💩

nadako commented 5 years ago

I once wrote this https://gist.github.com/nadako/f01e6837d5508e922f0ef7287f0520bf. Not sure how well it applies to JVM, but I assume nicolas does something like this in HL too?

Simn commented 5 years ago

The problem with all these optimization ideas is that this is specified to work:

class Main {
    static public function main() {
        var td = {
            a: null
        }
        trace(Reflect.deleteField(td, "a")); // true
        trace(Reflect.deleteField(td, "a")); // false
        trace(Reflect.hasField(td, "a")); // false
        td.a = null;
        trace(Reflect.hasField(td, "a")); // true
        trace(Reflect.deleteField(td, "a")); // true
        trace(Reflect.deleteField(td, "a")); // false
    }
}

Note in particular how we go from hasField = false to hasField = true after setting the value again.

nadako commented 5 years ago

I think it makes sense to unspecify deleteField for non-optional fields, but yeah we need to keep some "deleted" flag anyway...

Simn commented 5 years ago

To clarify: We'll have to do some bookkeeping anyway in order to support Reflect.fields, so you might get the idea that Reflect.deleteField could simply remove the field entry from whatever structure we maintain for that. That's correct, but the question then becomes who sets the entry back if a new value is assigned? This could only be done on every write-access, which is extremely stupid and subverts the entire optimization.

nadako commented 5 years ago

subverts the entire optimization.

That's debatable. In current C#/Java targets, anon objects are made of arrays with binary-search lookup every time. I'm pretty sure that having a proper structure with a setter that sets some bool/bit flag will be faster.

Simn commented 5 years ago

I think this is also specified:

class Main {
    static public function main() {
        var td:Dynamic = {};
        td.a = null;
        trace(Reflect.hasField(td, "a")); // true
    }
}

So it's necessary to have a storage for additional fields on these objects anyway. I think the best we can hope for is a lookup-based general implementation with a fast-path optimization for statically known fields. But even with these we have to be careful because of this deleteField crap...

nadako commented 5 years ago

Yep, this is actually very similar to implements Dynamic and how it was done in gencommon (the dynamic "trait" was added to normal class fields).

Simn commented 5 years ago

Here's a Haxe implementation of what I think could work: https://gist.github.com/Simn/f93d4945bcf7991bada5b85b54250565

the only overhead for known fields here is that if (_hxDeletedAField) check in the setter
the reflection map (_hx_fields) is only created if we need it, i.e. if we call of the _hx_ functions on the object.
unknown fields are looked up in a string map
we can omit the entire _hx_deletedAField part (and thus also the setters) if Reflect.deleteField is not part of the compilation

What do you think?

nadako commented 5 years ago

This looks a lot like modern JS engines with their hidden classes that switch to the dictionary mode when delete o[key] is used :)

I think it would be nice to avoid string map for getting/setting known fields. Even if there are ways to avoid it, I'm pretty sure Reflect.field is super-common (I think 99% of templating and scripting engines use that).

We could generate a switch over known field names for _hx_getField/_hx_setField and only fall back to _hx_field if the field was added at run-time. Although I'm not really sure that switch will be faster than a map lookup :)

nadako commented 5 years ago

Also, Here you meant "knownField" => this.knownField, right?

    override function _hx_getKnownFields() {
        return ["knownField" => null];
    }

also why we need to do a copy in _hx_initReflection?

Simn commented 5 years ago

Also, Here you meant "knownField" => this.knownField, right?

Yes

also why we need to do a copy in _hx_initReflection

We don't. I originally had the map as a static var and forgot to remove the copy().

Simn commented 5 years ago

I wonder about this from your design:

Second, for every unification of a class with a structure, we make that class implement the corresponding interface generated for that structure.

That's probably not gonna be enough because we could assign the class to the interface with something inbetween, like Dynamic. ((classInstance : Dynamic) : Interface) would not be detected correctly.

Another thing to consider are type parameter constraints.

nadako commented 5 years ago

((classInstance : Dynamic) : Interface)

I guess this will require some DynamicImplForInterface that goes fully-dynamic (basically implementing get/set for Interface by using _hx_getField). The only thing that we can maybe do here is to cache that wrapper, although I'm not sure it's even worth it.

Simn commented 5 years ago

I'm not worried about field access because we're gonna have a slow route for that anyway. My concern is the run-time assignment because we would have to make sure that classInstance can be assigned to Interface (bad name, I meant one of these anon interfaces).

nadako commented 5 years ago

Yeah, well one option is to generate new DynamicInterface(classInstance), where:

class DynamicInterface implements Interface { // actually it has to also implement HxObject itself
  public var knownField(get,set):Int;

  final __obj:HxObject;

  public function new(obj:HxObject) {
     this.__obj = obj;
  }

  function get_knownField():Int return __obj._hx_getField("knownField");
  function set_knownField(v:Int):Int return __obj._hx_setField("knownField", v);
}

Of course it's technically an extra allocation, but a) if you do that you deserve it and b) it will probably be inlined on stack by Java/JVM anyway.

Simn commented 5 years ago

I've added the dynamic version for now. Initialization is somewhat verbose at instruction-level:

        var obj = {
            a: null,
            b: 1,
            c: "foo"
        }

  public void testObjectDecl();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=7, locals=8, args_size=1
         0: new           #571                // class haxe/jvm/DynamicObject
         3: dup
         4: invokespecial #572                // Method haxe/jvm/DynamicObject."<init>":()V
         7: dup
         8: dup
         9: ldc_w         #576                // String a
        12: aconst_null
        13: invokevirtual #575                // Method haxe/jvm/DynamicObject._hx_setField:(Ljava/lang/String;Ljava/lang/Object;)V
        16: dup
        17: dup
        18: ldc_w         #577                // String b
        21: iconst_1
        22: invokestatic  #53                 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
        25: invokevirtual #575                // Method haxe/jvm/DynamicObject._hx_setField:(Ljava/lang/String;Ljava/lang/Object;)V
        28: dup
        29: dup
        30: ldc_w         #578                // String c
        33: ldc_w         #554                // String foo
        36: invokevirtual #575                // Method haxe/jvm/DynamicObject._hx_setField:(Ljava/lang/String;Ljava/lang/Object;)V

This is gonna be quite efficient though.

Simn commented 5 years ago

The alternative would be allocating two native arrays, one for names and one for values. That way we would only have 1 set call instead of N, but we would allocate two native arrays and have N store instructions. Plus the set function would then have to iterate on these arrays, which isn't free either. Not sure if that's worth it or not.

Simn commented 5 years ago

We can improve this in 3 stages:

Stage 1: Generate TObjectDecl as class instance

This should be uncontroversial: Instead of creating a DynamicObject and calling a bunch of _hx_setField on it, we create an (anonymous) data class and call its constructor. Given that dynamic field access already works on class instances, it's still going to work after this change.

We can try to re-use existing data classes for this, but we have to consider evaluation order: The order of the TObjectDecl fields has to be respected wenn calling the constructor.

Stage 2: Optimize FAnon access with a checkcast

If we have TField(e1,FAnon cf), we can generate the equivalent of (e1 instanceof AnonClassForE1 ? (e1 : AnonClassForE1).theField : (e1 : Dynamic).theField. This replaces the fully dynamic lookup with an instanceof + checkcast branch, which should be several orders of magnitude faster.

Stage 3: nadako-interfaces

I have to think more about this idea first. I'm a bit concerned about polluting class declarations with lots of possible interface relations to anonymous types. I would like to try + benchmark this at some point though.

I wonder if we should consider typedefs at all to identify anons types. Nothing really prevents us from generating a class for haxe.macro.Expr with the expr and pos fields. We can then find that type when we have a TAnon. It's always a bit annoying because we have to sort the fields and consider both names and types, but it can certainly be done.

This would mean that some truly anonymous types might get misidentified, but I don't think this is an issue as long as we don't assign any semantics to these classes. And even then I don't see what could go wrong with that.

Simn commented 5 years ago

class Main {
    static public function main() {
        var obj = getObj();
        var stamp = haxe.Timer.stamp();
        var target = stamp + 2.0;
        var num = 0;
        while (haxe.Timer.stamp() < target) {
            ++num;
            call(obj.obj.obj.obj);
        }
        trace(num);
    }

    static function call(d:Dynamic) { }

    static function getObj() {
        return { obj: { obj: { obj: { obj: 12 }}}};
    }
}

Merge_2019-03-29_22-30-08

Note that there aren't actually any temp vars, that's just how the decompiler displays this.

before:     6855521
after   : 287782212

nadako commented 5 years ago

I wonder if we should consider typedefs at all to identify anons types.

That would be very nice for readability and native interop :)

nadako commented 5 years ago

Oh, regarding

polluting class declarations with lots of possible interface relations to anonymous types

My idea was to detect unifications and only add implements when needed.

Simn commented 5 years ago

My idea was to detect unifications and only add implements when needed.

But what about ((classInstance : Dynamic) : Interface)?

nadako commented 5 years ago

I guess it would be fair to have a fully-dynamic wrapper implementation for this, like we talked before: https://github.com/Simn/genjvm/issues/1#issuecomment-474818936

nadako commented 5 years ago

Actually, regarding the example code. I think we should have 2 anon classes generated for that nested obj structures: one with generic Object obj, and one with int obj. This shouldn't be too bad in regards of generated code, if we reduce all types to the JVM ones (so basically object and numbers, iirc), but should be nicer wrt boxing.

Simn commented 5 years ago

Oh it does that, I just didn't .obj enough so it never reached the integer field:

    public static void main(String[] args) {
        Object obj = getObj();
        Object var10000 = obj instanceof Anon1 ? ((Anon1)obj).obj : Jvm.readField(obj, "obj");
        var10000 = var10000 instanceof Anon1 ? ((Anon1)var10000).obj : Jvm.readField(var10000, "obj");
        var10000 = var10000 instanceof Anon1 ? ((Anon1)var10000).obj : Jvm.readField(var10000, "obj");
        call(var10000 instanceof Anon2 ? ((Anon2)var10000).obj : Jvm.toInt(Jvm.readField(var10000, "obj")));
    }

Interestingly, that decompilation is missing the boxing on for the call call where the argument is Dynamic. It's in the bytecode though:

        69: instanceof    #25                 // class haxe/generated/Anon2
        72: ifeq          84
        75: checkcast     #25                 // class haxe/generated/Anon2
        78: getfield      #28                 // Field haxe/generated/Anon2.obj:I
        81: goto          92
        84: ldc           #17                 // String obj
        86: invokestatic  #23                 // Method haxe/jvm/Jvm.readField:(Ljava/lang/Object;Ljava/lang/String;)Ljava/lang/Object;
        89: invokestatic  #32                 // Method haxe/jvm/Jvm.toInt:(Ljava/lang/Object;)I
        92: invokestatic  #38                 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
        95: invokestatic  #42                 // Method call:(Ljava/lang/Object;)V

It goes 72 -> 75 -> 78 -> 81 -> 92 -> 95, and 92 has the boxing.

This also makes me realize that the other path does unbox + box (89 + 92). Not sure if there's a good way to fix that though because we need a consistent stack map at the branch join (92 which is reached from 81 (where we have an int) and 89 (where we would have an Integer without the unboxing).

Simn commented 5 years ago

Addressed the casting in #25:

        69: instanceof    #25                 // class haxe/generated/Anon2
        72: ifeq          87
        75: checkcast     #25                 // class haxe/generated/Anon2
        78: getfield      #28                 // Field haxe/generated/Anon2.obj:I
        81: invokestatic  #34                 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
        84: goto          92
        87: ldc           #17                 // String obj
        89: invokestatic  #23                 // Method haxe/jvm/Jvm.readField:(Ljava/lang/Object;Ljava/lang/String;)Ljava/lang/Object;
        92: invokestatic  #38                 // Method call:(Ljava/lang/Object;)V

It now has the Integer.valueOf within the branch (81) and doesn't cast at all on the other path.

Simn commented 5 years ago

And for completeness, this is the code when we expect Int instead of Dynamic on the call argument:

        69: instanceof    #25                 // class haxe/generated/Anon2
        72: ifeq          84
        75: checkcast     #25                 // class haxe/generated/Anon2
        78: getfield      #28                 // Field haxe/generated/Anon2.obj:I
        81: goto          92
        84: ldc           #17                 // String obj
        86: invokestatic  #23                 // Method haxe/jvm/Jvm.readField:(Ljava/lang/Object;Ljava/lang/String;)Ljava/lang/Object;
        89: invokestatic  #32                 // Method haxe/jvm/Jvm.toInt:(Ljava/lang/Object;)I
        92: invokestatic  #36                 // Method call:(I)V

nadako commented 5 years ago

awesome! so clean

Simn commented 5 years ago

I'm still a bit concerned that the decompilers don't show that cast. IntelliJ just omits it whereas JD simply says /* Error */ on the entirety of main. x)

Simn commented 5 years ago

I went ahead and did the typedef identification thing so we now get nice names:

public class Expr extends DynamicObject {
    public ExprDef expr;
    public Object pos;

    protected StringMap _hx_getKnownFields() {
        StringMap tmp = new StringMap();
        tmp.set("expr", this.expr);
        tmp.set("pos", this.pos);
        return tmp;
    }

    public Expr(ExprDef expr, Object pos) {
        this.expr = expr;
        this.pos = pos;
        super();
    }
}

As I said before, this would also pick up NotExpr if it has the same structure, but I think that's fair.

nadako commented 5 years ago

nice, why this before super tho? doesn't that supposed to not work?

Simn commented 5 years ago

Uhm, interesting, I didn't really notice that... Looks like the JVM is fine with this as long as we don't actually call something on this.

nadako commented 5 years ago

I think we should not generate an empty Anon class for {} and just use DynamicObject directly.

Simn commented 5 years ago

Indeed

Simn commented 4 years ago

We now generate proper interfaces, so this is resolved.

Simn / genjvm

TObjectDecl #1

Stage 1: Generate TObjectDecl as class instance

Stage 2: Optimize FAnon access with a checkcast

Stage 3: nadako-interfaces