Closed Simn closed 4 years ago
One thing to consider here is Reflect.deleteField
. 💩
I once wrote this https://gist.github.com/nadako/f01e6837d5508e922f0ef7287f0520bf. Not sure how well it applies to JVM, but I assume nicolas does something like this in HL too?
The problem with all these optimization ideas is that this is specified to work:
class Main {
static public function main() {
var td = {
a: null
}
trace(Reflect.deleteField(td, "a")); // true
trace(Reflect.deleteField(td, "a")); // false
trace(Reflect.hasField(td, "a")); // false
td.a = null;
trace(Reflect.hasField(td, "a")); // true
trace(Reflect.deleteField(td, "a")); // true
trace(Reflect.deleteField(td, "a")); // false
}
}
Note in particular how we go from hasField = false to hasField = true after setting the value again.
I think it makes sense to unspecify deleteField
for non-optional fields, but yeah we need to keep some "deleted" flag anyway...
To clarify: We'll have to do some bookkeeping anyway in order to support Reflect.fields
, so you might get the idea that Reflect.deleteField
could simply remove the field entry from whatever structure we maintain for that. That's correct, but the question then becomes who sets the entry back if a new value is assigned? This could only be done on every write-access, which is extremely stupid and subverts the entire optimization.
subverts the entire optimization.
That's debatable. In current C#/Java targets, anon objects are made of arrays with binary-search lookup every time. I'm pretty sure that having a proper structure with a setter that sets some bool/bit flag will be faster.
I think this is also specified:
class Main {
static public function main() {
var td:Dynamic = {};
td.a = null;
trace(Reflect.hasField(td, "a")); // true
}
}
So it's necessary to have a storage for additional fields on these objects anyway. I think the best we can hope for is a lookup-based general implementation with a fast-path optimization for statically known fields. But even with these we have to be careful because of this deleteField crap...
Yep, this is actually very similar to implements Dynamic
and how it was done in gencommon (the dynamic "trait" was added to normal class fields).
Here's a Haxe implementation of what I think could work: https://gist.github.com/Simn/f93d4945bcf7991bada5b85b54250565
if (_hxDeletedAField)
check in the setter_hx_fields
) is only created if we need it, i.e. if we call of the _hx_
functions on the object._hx_deletedAField
part (and thus also the setters) if Reflect.deleteField
is not part of the compilationWhat do you think?
This looks a lot like modern JS engines with their hidden classes that switch to the dictionary mode when delete o[key]
is used :)
I think it would be nice to avoid string map for getting/setting known fields. Even if there are ways to avoid it, I'm pretty sure Reflect.field
is super-common (I think 99% of templating and scripting engines use that).
We could generate a switch over known field names for _hx_getField
/_hx_setField
and only fall back to _hx_field
if the field was added at run-time. Although I'm not really sure that switch will be faster than a map lookup :)
Also, Here you meant "knownField" => this.knownField
, right?
override function _hx_getKnownFields() {
return ["knownField" => null];
}
also why we need to do a copy in _hx_initReflection
?
Also, Here you meant "knownField" => this.knownField, right?
Yes
also why we need to do a copy in _hx_initReflection
We don't. I originally had the map as a static var and forgot to remove the copy().
I wonder about this from your design:
Second, for every unification of a class with a structure, we make that class implement the corresponding interface generated for that structure.
That's probably not gonna be enough because we could assign the class to the interface with something inbetween, like Dynamic. ((classInstance : Dynamic) : Interface)
would not be detected correctly.
Another thing to consider are type parameter constraints.
((classInstance : Dynamic) : Interface)
I guess this will require some DynamicImplForInterface
that goes fully-dynamic (basically implementing get/set for Interface
by using _hx_getField
). The only thing that we can maybe do here is to cache that wrapper, although I'm not sure it's even worth it.
I'm not worried about field access because we're gonna have a slow route for that anyway. My concern is the run-time assignment because we would have to make sure that classInstance
can be assigned to Interface
(bad name, I meant one of these anon interfaces).
Yeah, well one option is to generate new DynamicInterface(classInstance)
, where:
class DynamicInterface implements Interface { // actually it has to also implement HxObject itself
public var knownField(get,set):Int;
final __obj:HxObject;
public function new(obj:HxObject) {
this.__obj = obj;
}
function get_knownField():Int return __obj._hx_getField("knownField");
function set_knownField(v:Int):Int return __obj._hx_setField("knownField", v);
}
Of course it's technically an extra allocation, but a) if you do that you deserve it and b) it will probably be inlined on stack by Java/JVM anyway.
I've added the dynamic version for now. Initialization is somewhat verbose at instruction-level:
var obj = {
a: null,
b: 1,
c: "foo"
}
public void testObjectDecl();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=7, locals=8, args_size=1
0: new #571 // class haxe/jvm/DynamicObject
3: dup
4: invokespecial #572 // Method haxe/jvm/DynamicObject."<init>":()V
7: dup
8: dup
9: ldc_w #576 // String a
12: aconst_null
13: invokevirtual #575 // Method haxe/jvm/DynamicObject._hx_setField:(Ljava/lang/String;Ljava/lang/Object;)V
16: dup
17: dup
18: ldc_w #577 // String b
21: iconst_1
22: invokestatic #53 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
25: invokevirtual #575 // Method haxe/jvm/DynamicObject._hx_setField:(Ljava/lang/String;Ljava/lang/Object;)V
28: dup
29: dup
30: ldc_w #578 // String c
33: ldc_w #554 // String foo
36: invokevirtual #575 // Method haxe/jvm/DynamicObject._hx_setField:(Ljava/lang/String;Ljava/lang/Object;)V
This is gonna be quite efficient though.
The alternative would be allocating two native arrays, one for names and one for values. That way we would only have 1 set call instead of N, but we would allocate two native arrays and have N store instructions. Plus the set function would then have to iterate on these arrays, which isn't free either. Not sure if that's worth it or not.
We can improve this in 3 stages:
This should be uncontroversial: Instead of creating a DynamicObject and calling a bunch of _hx_setField
on it, we create an (anonymous) data class and call its constructor. Given that dynamic field access already works on class instances, it's still going to work after this change.
We can try to re-use existing data classes for this, but we have to consider evaluation order: The order of the TObjectDecl
fields has to be respected wenn calling the constructor.
If we have TField(e1,FAnon cf)
, we can generate the equivalent of (e1 instanceof AnonClassForE1 ? (e1 : AnonClassForE1).theField : (e1 : Dynamic).theField
. This replaces the fully dynamic lookup with an instanceof + checkcast branch, which should be several orders of magnitude faster.
I have to think more about this idea first. I'm a bit concerned about polluting class declarations with lots of possible interface relations to anonymous types. I would like to try + benchmark this at some point though.
I wonder if we should consider typedefs at all to identify anons types. Nothing really prevents us from generating a class for haxe.macro.Expr
with the expr
and pos
fields. We can then find that type when we have a TAnon
. It's always a bit annoying because we have to sort the fields and consider both names and types, but it can certainly be done.
This would mean that some truly anonymous types might get misidentified, but I don't think this is an issue as long as we don't assign any semantics to these classes. And even then I don't see what could go wrong with that.
class Main {
static public function main() {
var obj = getObj();
var stamp = haxe.Timer.stamp();
var target = stamp + 2.0;
var num = 0;
while (haxe.Timer.stamp() < target) {
++num;
call(obj.obj.obj.obj);
}
trace(num);
}
static function call(d:Dynamic) { }
static function getObj() {
return { obj: { obj: { obj: { obj: 12 }}}};
}
}
Note that there aren't actually any temp vars, that's just how the decompiler displays this.
before: 6855521
after : 287782212
I wonder if we should consider typedefs at all to identify anons types.
That would be very nice for readability and native interop :)
Oh, regarding
polluting class declarations with lots of possible interface relations to anonymous types
My idea was to detect unifications and only add implements
when needed.
My idea was to detect unifications and only add implements when needed.
But what about ((classInstance : Dynamic) : Interface)
?
I guess it would be fair to have a fully-dynamic wrapper implementation for this, like we talked before: https://github.com/Simn/genjvm/issues/1#issuecomment-474818936
Actually, regarding the example code. I think we should have 2 anon classes generated for that nested obj structures: one with generic Object obj
, and one with int obj
. This shouldn't be too bad in regards of generated code, if we reduce all types to the JVM ones (so basically object and numbers, iirc), but should be nicer wrt boxing.
Oh it does that, I just didn't .obj
enough so it never reached the integer field:
public static void main(String[] args) {
Object obj = getObj();
Object var10000 = obj instanceof Anon1 ? ((Anon1)obj).obj : Jvm.readField(obj, "obj");
var10000 = var10000 instanceof Anon1 ? ((Anon1)var10000).obj : Jvm.readField(var10000, "obj");
var10000 = var10000 instanceof Anon1 ? ((Anon1)var10000).obj : Jvm.readField(var10000, "obj");
call(var10000 instanceof Anon2 ? ((Anon2)var10000).obj : Jvm.toInt(Jvm.readField(var10000, "obj")));
}
Interestingly, that decompilation is missing the boxing on for the call
call where the argument is Dynamic
. It's in the bytecode though:
69: instanceof #25 // class haxe/generated/Anon2
72: ifeq 84
75: checkcast #25 // class haxe/generated/Anon2
78: getfield #28 // Field haxe/generated/Anon2.obj:I
81: goto 92
84: ldc #17 // String obj
86: invokestatic #23 // Method haxe/jvm/Jvm.readField:(Ljava/lang/Object;Ljava/lang/String;)Ljava/lang/Object;
89: invokestatic #32 // Method haxe/jvm/Jvm.toInt:(Ljava/lang/Object;)I
92: invokestatic #38 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
95: invokestatic #42 // Method call:(Ljava/lang/Object;)V
It goes 72 -> 75 -> 78 -> 81 -> 92 -> 95, and 92 has the boxing.
This also makes me realize that the other path does unbox + box (89 + 92). Not sure if there's a good way to fix that though because we need a consistent stack map at the branch join (92 which is reached from 81 (where we have an int) and 89 (where we would have an Integer without the unboxing).
Addressed the casting in #25:
69: instanceof #25 // class haxe/generated/Anon2
72: ifeq 87
75: checkcast #25 // class haxe/generated/Anon2
78: getfield #28 // Field haxe/generated/Anon2.obj:I
81: invokestatic #34 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
84: goto 92
87: ldc #17 // String obj
89: invokestatic #23 // Method haxe/jvm/Jvm.readField:(Ljava/lang/Object;Ljava/lang/String;)Ljava/lang/Object;
92: invokestatic #38 // Method call:(Ljava/lang/Object;)V
It now has the Integer.valueOf
within the branch (81) and doesn't cast at all on the other path.
And for completeness, this is the code when we expect Int
instead of Dynamic
on the call
argument:
69: instanceof #25 // class haxe/generated/Anon2
72: ifeq 84
75: checkcast #25 // class haxe/generated/Anon2
78: getfield #28 // Field haxe/generated/Anon2.obj:I
81: goto 92
84: ldc #17 // String obj
86: invokestatic #23 // Method haxe/jvm/Jvm.readField:(Ljava/lang/Object;Ljava/lang/String;)Ljava/lang/Object;
89: invokestatic #32 // Method haxe/jvm/Jvm.toInt:(Ljava/lang/Object;)I
92: invokestatic #36 // Method call:(I)V
awesome! so clean
I'm still a bit concerned that the decompilers don't show that cast. IntelliJ just omits it whereas JD simply says /* Error */
on the entirety of main
. x)
I went ahead and did the typedef identification thing so we now get nice names:
public class Expr extends DynamicObject {
public ExprDef expr;
public Object pos;
protected StringMap _hx_getKnownFields() {
StringMap tmp = new StringMap();
tmp.set("expr", this.expr);
tmp.set("pos", this.pos);
return tmp;
}
public Expr(ExprDef expr, Object pos) {
this.expr = expr;
this.pos = pos;
super();
}
}
As I said before, this would also pick up NotExpr
if it has the same structure, but I think that's fair.
nice, why this before super tho? doesn't that supposed to not work?
Uhm, interesting, I didn't really notice that... Looks like the JVM is fine with this as long as we don't actually call something on this
.
I think we should not generate an empty Anon
class for {}
and just use DynamicObject
directly.
Indeed
We now generate proper interfaces, so this is resolved.
We need a good way to generate anonymous objects.