AssemblyScript / assemblyscript

A TypeScript-like language for WebAssembly.
https://www.assemblyscript.org
Apache License 2.0
16.6k stars 650 forks source link

Proposal for struct-like datatypes #2858

Open Mudloop opened 1 week ago

Mudloop commented 1 week ago

Feature suggestion

Hi,

I had this idea for C#-style structs that I believe could be implemented using a Transform. I know structs have been brought up a couple of times, so there's clearly some demand.

Take this code :

@struct export class Vector2 {
    x: f32 = 0;
    y: f32 = 0;
    constructor(x: f32, y: f32) {
        x = x;
        y = y;
    }
}
export class Test {
    position: Vector2 = new Vector2(0, 0);
}

That could be transformed into this :

@final @unmanaged export class Vector2 {
    x: f32 = 0;
    y: f32 = 0;
    @inline constructor(x: f32, y: f32) {
        let ret = changetype<Vector2>(memory.data(sizeof<Vector2>()));
        ret.x = x;
        ret.y = y;
        return ret;
    }
}
export class Test {
    private _position_x: f32 = 0;
    private _position_y: f32 = 0;
    @inline get position(): Vector2 {
        return new Vector2(this._position_x, this._position_y);
    }
    set position(value: Vector2) {
        this._position_x = value.x;
        this._position_y = value.y;
    }
}

Which essentially makes the Vector2 class allocate on the stack when instantiated, while still being able to assign it to objects.

Obviously it gets more complex with nested structs and generics involved, and things like offsetof("position") would also need to be transformed, and respecting the original's constructor code would become tricky.

Also, I believe it would need to turn things like this :

let v1 = new Vector2(0, 0);
let v2 = v1;

Into something like this :

let v1 = new Vector2(0, 0);
let v2 = changetype<Vector2>(memory.data(sizeof<Vector2>()));
memory.copy(changetype<usize>(v1), changetype<usize>(v2), sizeof<Vector2>());

Not entirely sure how to deal with passing it around as a parameter or returning it, because my understanding of how memory works in AssemblyScript is a bit limited.

Would love to hear some thoughts on this idea.

Mudloop commented 1 week ago

For nested structs, something like this :

@struct export class StructA {
    val: f32 = 0;
    constructor(val: f32) {
        this.val = val;
    }
}
@struct export class StructB {
    val: f32 = 0;
    nested: StructA;
    constructor(val: f32, nested: StructA) {
        this.val = val;
        this.nested = nested;
    }
}
export class Tester {
    struct: StructB = new StructB(0, new StructA(0));
}

Could become :

@final @unmanaged export class StructA {
    val: f32 = 0;
    /* @ts-ignore */
    @inline constructor(val: f32) {
        let ret = changetype<StructA>(memory.data(sizeof<StructA>()));
        ret.val = val;
        return ret;
    }
}
@final @unmanaged export class StructB {
    _val: f32 = 0;
    _nested_val: f32 = 0;
    @inline
    get val(): f32 { return this._val; }
    set val(value: f32) { this._val = value; }
    @inline
    get nested(): StructA { return new StructA(this._nested_val); }
    set nested(value: StructA) { this._nested_val = value.val; }
    /* @ts-ignore */
    @inline constructor(val:f32, nested: StructA) {
        let ret = changetype<StructB>(memory.data(sizeof<StructB>()));
        ret.val = val;
        ret.nested = nested;
        return ret;
    }
}
export class Tester {
    _struct_val: f32 = 0;
    _struct_nested_val: f32 = 0;
    @inline get struct(): StructB {
        return new StructB(this._struct_val, new StructA(this._struct_nested_val));
    }
    set struct(value: StructB) {
        this._struct_val = value.val;
        this._struct_nested_val = value.nested.val;
    }
}

Makes my head spin a little, might have made some mistakes, so consider this pseudo-code, and I hope the concept is clear.

Edit : it probably shouldn't call constructors for getting the encapsulated "structs", and rather do the memory stuff directly, because we wouldn't want to call the constructor every time a struct gets copied.

CountBleck commented 1 week ago

One caveat is that there is no stack in AS, except for the shadow stack used for garbage collection.

JairusSW commented 1 week ago

@CountBleck, I suppose you could allocate a page or two and call that the stack like https://github.com/fabricio-p/as-malloc does

CountBleck commented 1 week ago

I believe this was discussed elsewhere and a long while back, but multi-value would probably be better suited for AS. Binaryen implements multi-value using a special tuple type, so I believe you can have tuples as local variables (that get exploded into their constituent variables).

That likely won't solve this use case though, since it would be pass-by-value and not pass-by-reference, and I'm not sure whether nesting tuples is supported at all.

Another good fit would be GC types, which are pass-by-reference and support nesting...but they can't be stored to regular classes since they're opaque.

@JairusSW you could definitely use memory.data(N) to preallocate a page or two and have a transform that instruments the allocations and resets a global stack pointer on function returns...returning structs might be a bit more involved though :P

Mudloop commented 1 week ago

One caveat is that there is no stack in AS, except for the shadow stack used for garbage collection.

Wait, that's confusing me - then what is __stack_pointer for? If you call memory.data, I thought that reserved some memory on the stack, which would get freed once the current function is exited. And in contrast, if you call heap.alloc, that permanently reserves some memory (until manually freed). Is that wrong?

CountBleck commented 1 week ago

__stack_pointer is for that shadow stack I mentioned. The shadow stack is there so managed objects in local variables don't get prematurely garbage collected.

memory.data(123) reserves a block of memory at compile-time, not unlike a global uint8_t some_data[123] = {0}; declaration in C.

Mudloop commented 1 week ago

__stack_pointer is for that shadow stack I mentioned. The shadow stack is there so managed objects in local variables don't get prematurely garbage collected.

memory.data(123) reserves a block of memory at compile-time, not unlike a global uint8_t some_data[123] = {0}; declaration in C.

Oh ok, it's slowly starting to make sense.

Correct me if I'm wrong, but for the usecase I showed, I think this wouldn't actually be a problem since the constructor is inlined, so the "reinterpreted" Vector2 it's returning will still be unique.

My main reason for wanting something like this (besides GC / performance) is that without value types, doing enemy1.position = enemy2.position would lock their positions together unless that's handled by setters. Can easily lead to bugs.

Tuples / multi values sounds like it would work, but I'm unclear on whether that's implemented in AssemblyScript at this point, or if it's planned?

Mudloop commented 1 week ago

Oh I see the flaw with this approach. If I would call the inlined constructor in a loop, and add them to an array, they will all be the same references / pointers. Unless of course that would get transformed too, but yeah, complexity adds up quickly.

Guess I’ll just wait for tuple support.