HaxeFoundation / haxe

Haxe - The Cross-Platform Toolkit
https://haxe.org
6.17k stars 654 forks source link

Array<T> variance on static targets #4872

Open Simn opened 8 years ago

Simn commented 8 years ago

To avoid fragmentation I'm consolidating multiple issues here that likely have the same cause and solution:

2103

2579

3429

4187

4340

4424

4783

Edit NC, added

5799

ncannasse commented 8 years ago

I have studied quite in detail the problem of variance with Arrays, and the particular case of Array<Dynamic>, as part of my work on HL.

I came up with the following design that I would like to be generalized on all "static" targets that have type-specific arrays (usually for basic types such as Array<Float>), so that would concern C++,C#,Java, and HL.

I define several Array classes implementations:

The trick is the following:

When creating a new ArrayDyn, we have an extra flag in it which is named allowReinterpret, which means that when we cast such ArrayDyn to an Array<Int> for instance, we will first reinterpret the underlying ArrayObj to an ArrayBasic<Int> before returning it, then set allowReinterpret to false, which means that after the first cast to a known type, the Array type becomes fixed, which prevent copying the whole array on each of such casts.

I think this cover all the problematic cases (JSON, Serialization, etc.)

This requires two kind of runtime support/emulation:

The full source code for HL solution is available here: https://github.com/HaxeFoundation/haxe/tree/hl/std/hl/types (ArrayBase, ArrayObj, ArrayDyn)

I would like to hear about @hughsando and @waneck comment on this, then if it's fine with both of you, I can write unit tests for it so we make sure that this behavior is correctly standardized on all static platforms.

ncannasse commented 8 years ago

PS: one last thing I haven't yet worked on is referential equality : we also want our ArrayDyn == its underlying ArrayBase.

hughsando commented 8 years ago

I have implemented a similar thing with "VirtualArray" The array starts off unknown, and then as you add stuff to it, the underlying representation changes until you ultimately cast it, and then it becomes fixed. So if you add a bunch of ints, it is an int array. If you then add a float, it becomes a float array. If you add a null, it becomes a Dynamic array. This should help by avoiding boxing in a lot of cases. You can try this in hxcpp target by swapping the commenting of these likes: "cpp::ArrayBase" ^ suffix (* "cpp::VirtualArray" ^ suffix *) (side note - can have a global variable to switch this in ocaml, or do I need to pass a context everywhere?)

It seems to work ok, but I have not got it working in cppia.

On Tue, Feb 23, 2016 at 7:59 PM, Nicolas Cannasse notifications@github.com wrote:

PS: one last thing I haven't yet worked on is referential equality : we also want our ArrayDyn == its underlying Array.

— Reply to this email directly or view it on GitHub https://github.com/HaxeFoundation/haxe/issues/4872#issuecomment-187671417 .

ncannasse commented 8 years ago

@hughsando the problem with this approach is that your array keep changing its underlying type, for instance if you add two floats, then cast to Array<Int>, then add floats again, what is the behavior ? My proposal is that the array is either in an "unkown state" (dynamic) or in a known state, and once it is it cannot change anymore. And since the first cast to an not-Array-Dynamic will make it a known state, it guarantee that you never happen to have several copies of the underlying array which does not preserve side effects.

hughsando commented 8 years ago

The casting only ever happens in one direction: empty -> Int -> Float -> Dynamic (maybe skip Float or Int), or empty -> String -> Dynamic or empty -> Bool -> Dynamic. or just empty -> Dynamic. As soon as you cast, it locks it in forever (like your solution). The good thing about this is that the final cast (eg, to Array) is usually a no-op if you have only added ints - ie, zero changes. If you add a whole lot of ints, a single float and then an object, you could get 2 transitions, where perhaps none was required. But it is more likely to be a homogeneous array, so I would think that this rarer than the "not need to do anything" case. Adding two floats and then casting to Ints shows a breakdown of the compiler logic so throwing an error might be appropriate, but currently it will perform Float->Int and lock it in.

On Wed, Feb 24, 2016 at 4:01 AM, Nicolas Cannasse notifications@github.com wrote:

@hughsando https://github.com/hughsando the problem with this approach is that your array keep changing its underlying type, for instance if you add two floats, then cast to Array, then add floats again, what is the behavior ? My proposal is that the array is either in an "unkown state" (dynamic) or in a known state, and once it is it cannot change anymore. And since the first cast to an not-Array-Dynamic will make it a known state, it guarantee that you never happen to have several copies of the underlying array which does not preserve side effects.

— Reply to this email directly or view it on GitHub https://github.com/HaxeFoundation/haxe/issues/4872#issuecomment-187878494 .

ncannasse commented 8 years ago

Ok, that seems good to me, I could indeed optimize HL this way as well, it's not something really detectable by the enduser anyway.

Any reason it has not been actived by default in hxcpp yet ?

Also, still waiting to hear from @waneck :)

hughsando commented 8 years ago

Testing and cppia mainly - I wrote it, but did not have time to put out fires if it went wrong. Now I've moved over to "haxe mode", I will try to git it in in the next week or two.

Hugh

On Wed, Feb 24, 2016 at 8:49 PM, Nicolas Cannasse notifications@github.com wrote:

Ok, that seems good to me, I could indeed optimize HL this way as well, it's not something really detectable by the enduser anyway.

Any reason it has not been actived by default in hxcpp yet ?

Also, still waiting to hear from @waneck https://github.com/waneck :)

— Reply to this email directly or view it on GitHub https://github.com/HaxeFoundation/haxe/issues/4872#issuecomment-188240459 .

Simn commented 8 years ago

I just wanted to start adding tests for this but didn't get very far:

class Main {
    static function main() {
        var a:Array<Dynamic> = [1, 2, 3];
        trace("I'm just here to prevent the analyzer fusion");
        var b:Array<Int> = cast a;
        trace("Yeah me too");
        trace(a == b); // false
    }
}

@hughsando: Wasn't this supposed to work now?

hughsando commented 8 years ago

Looks like it is an issue with "op==". If you do: b.push(4); trace(a); you get 1,2,3,4

hughsando commented 8 years ago

I added the operators to hxcpp.

Simn commented 8 years ago

That helped with the direct ==, but it still fails if we hide the arrays behind type parameters:

class Main {
    static function main() {
        var a:Array<Dynamic> = [1, 2, 3];
        trace("I'm just here to prevent the analyzer fusion");
        var b:Array<Int> = cast a;
        trace("Yeah me too");
        eq(a, b);
    }

    static function eq<T>(e1:T, e2:T) trace(e1 == e2); // false
}
Simn commented 8 years ago

P.S @ncannasse: That fails on HL too, even in the interpreted version.

Simn commented 8 years ago

There's not much of a point in me adding a test that fails on multiple targets. I'll assign this issue to the 3.3.0 final milestone so we can see what we want to do here after the RC.

ncannasse commented 8 years ago

@Simn which targets apart hxcpp are concerned ? Java and C# I presume ?

@hughsando could you look into it for RC? I think this kind of non-equality could break people code when trying RC.

Simn commented 8 years ago

I think hxcpp is fine now actually, so it's probably Java, C# and HL.

ncannasse commented 8 years ago

Please submit the unit tests with #if !(hl || java || cs) for the parts that fail for now, and open a separate issue for these.

Simn commented 8 years ago

I don't like these #if tests, it's too easy to forget about them. I've added it without the conditional compilation to a separate branch so we can forget about that instead.

Simn commented 8 years ago

As expected the test fails on C# and HL: https://travis-ci.org/Simn/haxe/builds/120320083

ncannasse commented 8 years ago

I'm assigning to @waneck because HL is not ready for 3.3 anyway

waneck commented 8 years ago

I won't be able to get this done for 3.3. Pushing to 3.4

vroad commented 8 years ago

Moved from https://github.com/HaxeFoundation/haxe/issues/5467 .

It looks like C# implementation copying all fields of original Array. (Using reflection for some reason). Doesn't that cause more issues? That's slow, and newly created array no longer refers to the same array. I don't want Haxe to copy Array in that case.

nadako commented 8 years ago

Yes, this is #2103.

vroad commented 8 years ago

Is reflection always needed to achieve this? I don't know where reflection calls came from. Are they automatically generated by Haxe?

Gama11 commented 7 years ago

I think hxcpp is fine now actually, so it's probably Java, C# and HL.

@Simn Both of the test cases you posted here still trace false for me on hxcpp (Haxe 3.4.0 and hxcpp 3.4.49). Or was it fixed at some point and then broken again?

Simn commented 6 years ago

I have checked hxcpp again and both tests work there now. I also made sure that the analyzer doesn't inadvertently fix it.

I didn't test HL because all I ever get out of that these days is "Unsupported bytecode version". It's definitely still broken on Java and C# though.

Simn commented 6 years ago

My CI says this fails on C# and PHP of all things.

Simn commented 6 years ago

I confirmed HL to still be failing here as well.

ncannasse commented 6 years ago

I have fixed HL support for this specific test in be1a102a910ec8446dce8c325dc23aeb9a768033

ncannasse commented 6 years ago

This might break some runtime wrt Array handlings, we need proper unit testing for Haxe 4.0, we will discuss if we can make java/c# compliant in time but we need to pass the tests on other platforms.

Simn commented 6 years ago

@RealyUniqueName Could you look at the PHP failure?

class Main {
    @:analyzer(ignore)
    static public function main() {
        var a:Array<Dynamic> = [1, 2, 3];
        var b:Array<Int> = cast a;
        b.push(4);
        trace(a[3]); // null
    }
}

Note that the @:analyzer(ignore) is required to reproduce this.

waneck commented 6 years ago

Java shouldn't have any issue like that.

As for C#, my real issue with implementing this is how should we deal with user-created type parameters? The possible solutions I can think are:

  1. We don't generate type parameters for hxGen C# types (e.g. make erase_generics the default)
  2. We keep our old behavior, so that Array is a special case
  3. We try to generalize the way we handle Array so it works for any generic class

I can't think of a way to generalize the Array handling, but I'd be interested to hear ideas. As for between 1 and 2, I'm also not sure what's best. Does anyone have strong opinions regarding them?

Simn commented 6 years ago

What problems does erase_generics cause?

waneck commented 6 years ago

I think it's mostly about performance and generated code compatibility with native C# code

hughsando commented 6 years ago

Arrays are special - in particular Array is the one thing the the whole haxe type system that is inconsistent. I think if the above was "var a:Array = [1,2,3]" it would make everything consistent and logical for developers and backend writers Or maybe "IArray" interface. The solution hxcpp (and hl?) uses is to use an extra level of indirection for Array - this punishes Arrays-of-actual-Dynamic because there is not way to tell it apart from the "magical array that can be an array-of-int" shown here.

On Wed, Jun 6, 2018 at 1:36 AM, Cauê Waneck notifications@github.com wrote:

Java shouldn't have any issue like that.

As for C#, my real issue with implementing this is how should we deal with user-created type parameters? The possible solutions I can think are:

  1. We don't generate type parameters for hxGen C# types (e.g. make erase_generics the default)
  2. We keep our old behavior, so that Array is a special case
  3. We try to generalize the way we handle Array so it works for any generic class

I can't think of a way to generalize the Array handling, but I'd be interested to hear ideas. As for between 1 and 2, I'm also not sure what's best. Does anyone have strong opinions regarding them?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HaxeFoundation/haxe/issues/4872#issuecomment-394796613, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlp1lM3ircLKQDruzIvat-8RN-xTQjzks5t5sG0gaJpZM4Hgg8I .

RealyUniqueName commented 6 years ago

@RealyUniqueName Could you look at the PHP failure?

That's because Array.push() is inlined in std of php backend. That push is generated as:

$b = $a;
$this1 = $b->arr;
$idx = $b->length;
$this1[$idx] = 4;
++$b->length;

And $b->arr is a native PHP array. And native arrays are passed by copy, not by reference. I think I can fix this in std without removing inline. But this is not ok to automatically introduce temp vars for accessing object fields. I guess it also affects native structs on C# target.

Simn commented 6 years ago

I've updated the branch again:

RealyUniqueName commented 6 years ago

Can you please point me to the branch? Can't find it )

Simn commented 6 years ago

Sure, I keep these branches on my own GitHub because that tends to be better for CI: https://github.com/Simn/haxe/tree/array_variance_tests

RealyUniqueName commented 6 years ago

PHP fails because of the same problem described in my previous comment: https://github.com/HaxeFoundation/haxe/issues/4872#issuecomment-394954238

Simn commented 6 years ago

Yes but is that for me or for you to fix? If it's a problem with temp vars we have to check if it can be reproduced manually (without inline).

RealyUniqueName commented 6 years ago

I think it's for you to fix. Because those temp vars will case the same problem for any field of non-scalar value-type.

Simn commented 6 years ago

Then we have to mark the affected types with some metadata. Though I don't really like having to support copy-semantics just for this.

Note that this example also doesn't behave like the syntax would suggest:

class Main {
    @:analyzer(ignore)
    static public function main() {
        var a = [1, 2, 3];
        var b = @:privateAccess a.arr;
        b[3] = 4;
        trace(a[3]); // null
    }
}
RealyUniqueName commented 6 years ago

Note that this example also doesn't behave like the syntax would suggest

This is a defined behavior for php.NativeArray so it's up to the user to decide what to do with it. On the other hand inst.inlinedMethod() temp vars generation is out of the user's control, so it should behave the same way with and without inlining.

Simn commented 6 years ago

This would mean that for certain types we never create a temp var. This is quite problematic for various cases I can think of.

IMO this has to be approached differently, e.g. by not using a field access expression on the native array type.

Simn commented 5 years ago

Moved to 4.1 because it's a rather involved project.

hoseyjoe commented 4 years ago

Any motion on this for the exciting 4.1 release? I'm hoping https://github.com/HaxeFoundation/haxe/issues/8314 gets cleared.

Simn commented 4 years ago

This won't be resolved for 4.1. Which target are you concerned about?

hoseyjoe commented 4 years ago

C# We started using haxe 4 in preview 3. We liked the syntax so we started incorporating it as we weren't going to release for a year. We figured nothing major would change. Now our Server side (C#) is compiling out of rc2 and our client side (JS) is compiling out of 4.0.5

Simn commented 4 years ago

@kLabz What do you think about this? Any workaround you can recommend for #8314? I really don't want people to be stuck on RC versions.

kLabz commented 4 years ago

I don't think there's much that can be done except fix this issue.

The same kind of functionalities can be achieved, using c# utils (so not in a cross-target way), and without Array<T> (so it requires some rewrite):

import cs.system.collections.generic.List_1;
import newtonsoft.json.JsonConvert;

@:publicFields
class DataItem {
    var data:String;
    var strArray:List_1<String>;
    var intArray:List_1<Int>;
    var dynArray:List_1<Dynamic>;
    var anyArray:List_1<Any>;
}

class Main {
    public static function main():Void {
        var t:DataItem = JsonConvert.DeserializeObject(
            haxe.Json.stringify({data:"d", strArray:[], intArray: [], dynArray: [], anyArray: []})
        );

        var strArray = t.strArray;
        strArray.Add("via reference");
        t.strArray.Add("via getField");
        trace(strArray.Count); // 2
        trace(t.strArray.Count); // 2

        var strArray = t.strArray;
        strArray.Add("via reference");
        t.strArray.Add("via getField");
        trace(strArray.Count); // 4
        trace(t.strArray.Count); // 4

        var intArray = t.intArray;
        intArray.Add(0);
        t.intArray.Add(1);
        trace(intArray.Count); // 2
        trace(t.intArray.Count); // 2

        var dynArray = t.dynArray;
        dynArray.Add(0);
        dynArray.Add("test");
        trace(dynArray.Count); // 2
        trace(t.dynArray.Count); // 2

        var anyArray = t.anyArray;
        anyArray.Add(0);
        anyArray.Add("test");
        trace(anyArray.Count); // 2
        trace(t.anyArray.Count); // 2
    }
}