AssemblyScript / assemblyscript

A TypeScript-like language for WebAssembly.
https://www.assemblyscript.org
Apache License 2.0
16.74k stars 653 forks source link

Implement closures #798

Open MaxGraey opened 5 years ago

MaxGraey commented 5 years ago

I decide to start discussion about simple closure implementations.

Some obvious (and may be naive) implementation is using generation class context which I prefer to demonstrate in following example:

declare function externalCall(a: i32, b: i32): void;

function range(a: i32, b: i32, fn: (n: i32) => void): void {
  if (a < b) {
    fn(a);
    range(a + 1, b, fn);
  }
}

export function test(n: i32): void {
  range(0, n, (i: i32) => {
    externalCall(i, n); // capture n
  });
}

which transform to:


// generated
class ClosureContext {
  fn: (ctx: usize, i: i32) => void;
  n: i32; // captured var
  // optinal "self: usize;" is closure instantiate inside instance class method
  parent: ClosureContext | null = null;
}

// generated
function lambdaFn(ctx: usize, i: i32): void {
  externalCall(i, changetype<ClosureContext>(ctx).n);
}

function range(a: i32, b: i32, ctx: usize): void {
  if (a < b) {
    changetype<ClosureContext>(ctx).fn(ctx, a); // replaced from "fn(a)";
    range(a + 1, b, ctx);
  }
}

export function test(n: i32): void {
  // insert
  let ctx = new ClosureContext();
  ctx.fn = lambdaFn;
  ctx.n = n;
  //
  range(0, n, changetype<usize>(ctx));
}

Closure and ClosureContext will not generated when no one variable was captured and use usual anonym functions. ClosureContext will not created when only this was captured. In this case ctx param use to pass this reference.

Other discussions: #563

@dcodeIO @jtenner @willemneal Let me know what you think about this?

dcodeIO commented 4 years ago

Also I see you're still going by the global context route- is there a reason you're doing that instead of what we have now with the special this argument?

Nothing decided there, that's just what I'm used to from earlier discussions, so using something along those lines in pseudo code.

I ask because the global context can cause issues if you call a closure from within a closure

Can also be solved with a temp local:

let t = _env
_env = func.env
call_indirect(func.index, args)
_env = t

But iirc we also figured that there may be optimization advantages by prepending an additional argument instead. Just a bit sad that it mutates the signature (1:1 from source to binary would be nice).

gabriel-fallen commented 4 years ago

@dcodeIO I see, makes sense. Thanks for bringing me up to speed.

But then you're going to have an FFI layer for calling vanilla (external) WASM functions?

dcodeIO commented 4 years ago

But then you're going to have an FFI layer for calling vanilla (external) WASM functions?

The same argument can also be made about __setArgumentsLength. While that's still there, we don't gain much by prepending an argument on closures but not for varargs calls:

// with FFI for everything
let someVarargsClosure = ...
__setArgumentsLength(2)
__setClosureEnvironment(theEnv)
someVarargsClosure(...args)

// with prepended args for everything
let someVarargsClosure = ...
someVarargsClosure(2, theEnv, ...args)

Ultimately we'll want all of this to be at least consistent I guess.

DuncanUszkay1 commented 4 years ago

But then you're going to have an FFI layer for calling vanilla (external) WASM functions?

You can still call exported functions directly, but not returned function objects.

But otherwise, that's true. It's just part and parcel of treating functions like objects. For context, we had an original solution which allowed both closures and non-closures to exist, and had an ID scheme which separated the two. This ended up being fairly complicated and unintuitive- especially once we started getting into the GC.

And dcode brings up a good example of where that's already the case with variable argument lengths

DuncanUszkay1 commented 4 years ago

Nothing decided there, that's just what I'm used to from earlier discussions, so using something along those lines in pseudo code.

Right, okay. Just want to make sure that we're aligned on that so we don't get to the finish line and decide we need to change it if that can be avoided

DuncanUszkay1 commented 3 years ago

Circling back to this, trying to come up with a plan. I previously discussed with @torch2424 about just porting what we had before into the new first class function paradigm but I'm having a hard time wrapping my head around how we're going to do that right- I think instead we should rethink what we're doing. Here's a plan which I think would do it:

Plan for Closures

Overview

Functions now consist of an environment and function table index pair. When an identifier is used in a function and at least one descendant, it will be stored within this env. The _env of a first class function refers to the environment of it's parent function. When a function contains closures, it will create an environment held in a local that it injects into those closures as they are created.

We will discover Closures as we compile, and recompile the parent function when we do. This is because:

Already Done

Needs to be Done

Questions

Hopefully that's not too long or confusing- appreciate any and all comments

dcodeIO commented 3 years ago

It needs to implement __visit for GC purposes from my understanding. I believe this means we'll want to implement it as a class

The way this works is that PureRC performs an operation, like decrement, markGray, scan, scanBlack or collectWhite on an object, without knowing the kind of the object (and its layout) just yet. Hence it just calls __visit_members, a compiler-generated function with a big switch forwarding the operation to the actual block of code to execute, typically one function per class and then closure. As such, a closure needs a unique id to put into the switch, and a compiler-generated block of code to __visit each member, but there is no class needed for that. The layout of the environment can be encoded in 1) the loads and stores, including any levels of indirection, accessing its values and 2) the concrete __visit_impl called by __visit_members, one per unique closure.

Ideally it's simple to write out field accesses for the class before we know all the fields

This will work for leaf inner functions (those closing over variables), but not in all cases for the outer function (those whose variables are closed over). Consider for example:

function outer(): void {
  var x = 1;
  ++x; // x is not yet known to be part of a closure environment
  function inner(): void {
    ++x;
  }
  inner(); // now x is known to be part of a closure environment
}

In these cases, the outer function either needs to be post-processed to replace accesses to x with an access relative to env, or needs to be recompiled. Note that the level of nesting here can be arbitrary, in that each inner function can introduce new locals that'll only be known to be part of a new env later. Also note that, while this looks trivial, as soon as the ++x or inner() is behind a static check, we still need to "compile" to know the actual reachable contents of the function.

torch2424 commented 3 years ago

@DuncanUszkay1 Ah! Thanks for looking into an alternative approach for closures! Yeah to be honest, I got everything rebased, but was having a bit of trouble rationalizing how our previous work would be ported (It could be, and honestly woulnd't be hard to like I said, but it's just a lot of moving pieces since I didn't write most of the code myself.)

This plan looks good to me, and stoked to see @dcodeIO already helping fill in the gaps :)

DuncanUszkay1 commented 3 years ago

In these cases, the outer function either needs to be post-processed to replace accesses to x with an access relative to env

The outer function will be recompiled when it's detected that there is an inner closure that depends on the outer function or it's parents. That includes creating the env and replacing accesses with loads from the env

Note that the level of nesting here can be arbitrary

This plan should handle arbitrary nesting (albeit with some redundancy). The decision ends up being whether or not an inner function needs to be created as it's own independent first class function object with an environment. In this plan, the answer to that is yes as long as the inner function is a closure or has a descendent function which is a closure that depends on a variable defined in the outer function or it's parents. The downside of this approach is that you could conceivably have a longer chain of environments than you actually need like in this example:

function outer():  void {
  var a = 1;
  function inner(): void {
    function innerInner(): void {
      a++
    }
  }
}

In this case we'd have a chain of two envs when we really only need one:

Worth noting this process can be repeated any number of times, as long as innerInner knows how many layers up the variable it wants is and each function knows whether or not it will need an env, which should be doable with alterations of the lookup process (or similar)

So there's some redundancy because the env of inner is just acting as a middleman- but it greatly simplifies the process when arbitrary nesting is involved.

I think whether or not this compromise of extra environments linking variables together when a closure depends on the parent of its parent is OK for now is an important question before we move forward.

DuncanUszkay1 commented 3 years ago

Some updates from the Working Group:

pongstylin commented 3 years ago

Peanut gallery here. Let me know if there is another issue that addresses this question. Any chance we can improve upon how JavaScript deals with closures? That is, functions are not closures by default and you must declare the variables that are visible to a closure. You might use the same syntax used in PHP. Here is an example using an arrow function:

let i = 0; fn = (j) use (i) => i += j; fn(5); // i === 5

The reason I ask is because I like how AssemblyScript is more strict than JavaScript, improving processing performance. By making closures more strict, we can also maintain improved memory performance. Otherwise, all variables in the scope of a closure cannot be cleaned up by the GC even if they are not used by a closure.

MaxGraey commented 3 years ago

In most cases closure can be eliminated via some set of transformations like lambda lifting (partial extending), lambda dropping (partial inlining), eta / beta reductions, inlining and partial evaluation. In rest of cases when we should perform full closure conversion routine we could inform user about it as part of --pedantic mode for example.

Explicit declaration of free vars like this done in C++ or PHP doesn't make things simpler or more performant. Modern languages like Rust, Swift, Zig, Crystal and etc avoid such explicit declarations but although it would make more sense in them than for JS / TS due to that langs support capturing by value and by reference while JS/TS closures always capture by reference.

pongstylin commented 3 years ago

@MaxGraey Are you saying it is already a plan to optimize functions by detecting the presence of used scope variables (or eval) and allow unused scope variables to be cleaned when they go out of scope (and the closure hasn't been)?

If that is the case, then I guess there is no practical advantage for declared closed variables. Just stylistic preference.

MaxGraey commented 3 years ago

Are you saying it is already a plan to optimize functions by detecting the presence of used scope variables (or eval) and allow unused scope variables to be cleaned when they go out of scope (and the closure hasn't been)?

Yes, this will be possible after the middle intermediate representation has been completed (currently it in WIP). With mid IR find, analyse of free vars and optimize closures is just special separate pass (or group of passes).

ghost commented 3 years ago

Not an AS user, but if you want performance and fine low-level control, it might be best to have separate function types, or to change the meaning of arrow functions vs "normal" functions. If all functions are functors, then wouldn't ASC assume JS imports are also functors? This would also force explicit casting between functions and functors, ex:

Let's take this AS function:

function for_range(
    start: u32,
    stop: u32,
    callback: (n: u32) => void
) {
    for (let iterator = start; iterator < stop; ++iterator) {
        callback(iterator);
    }
}

if this were called with a "raw" function, ex, from the host side, then the code would break.

It might make more sense to do something more like:

function for_range(
    start: u32,
    stop: u32,
    callback: RawFunction<(n: u32) => void>
) {
    for (let iterator: u32 = start; iterator < stop; ++iterator) {
        callback(iterator);
    }
}

function foo(): void {
    let i: u32 = 0;

    for_range(0, 10, (n: u32): void => { log(++i) });
    // Error: "(n: u32) => void" is not assignable to RawFunction<(n: u32) => void>
}

Then again, that might not work out at all.


Also, you'd think that browsers, which have been optimized 24/7 to execute JS for the last decade, would know what locals to capture, and what not to, but their JiT compilers seem to make mistakes too often.

Explicit capturing might provide stronger guarantees about what is captured, by preventing potentially arbitrary compiler decisions.

MaxGraey commented 3 years ago

Also, you'd think that browsers, which have been optimized 24/7 to execute JS for the last decade, would know what locals to capture, and what not to, but their JiT compilers seem to make mistakes too often.

Found which function capture environment (has free variables) and actually is closure and which is not is pretty simple and fully precise and doing in ahead of time during analysis inside AS module. Main problem with functions / closures which cross wasm<->host boundary

ghost commented 3 years ago

...is pretty simple and fully precise and doing in ahead of time during analysis inside AS module.

That is a pretty bold assertion. ASC performance will likely slow down too. Also, each closure context will be different, so passing functions around won't be possible, at least not without dynamic typing/type pruning. Take that for_range function from earlier, each function would need to be called with a new closure type.

Next, consider this:

type ClosureStruct = {
        set: (n: i32) => void,
        get: () => i32
};

function foo(x: i32): ClosureStruct {
    return {
        set: (new_value: i32): void => {
            x = new_value;
        },
        get: (): i32 => x
    };
}

function bar(): void {
    const struct: ClosureStruct = foo(100);

    struct.set(10);
    struct.get(10);
}

Would this be accessing invalid data? How much GC will this require on ASC's end? AS might benefit from Wasm function reference binding.

MaxGraey commented 3 years ago

Take that for_range function from earlier, each function would need to be called with a new closure type.

For for_range we even don't need a closure. After lambda lifting it will be simple indirect/direct call.

MaxGraey commented 3 years ago

C++ can't do this usually. But Rust which have own intermediate representation called MIR provide similar transforms. See this here: https://godbolt.org/z/4qsM68

ghost commented 3 years ago

C++ can't do this usually.

C++ couldn't do it because of the std::function class; generally std::function calls will be a magnitude worse than function calls. An untyped lambda easily causes the same codegen as Rust did. See here: https://godbolt.org/z/6Y8h4T Although, of course, it differs between compilers.

But, good luck with whatever implementation that the ASC team uses for AS closures!

MaxGraey commented 3 years ago

C++ couldn't do it because of the std::function class; generally std::function calls will be a magnitude worse than function calls.

Ah right! I'm not very familiar with modern C++. But if you check llvm-ir you got a lot codegen initially. So all credit lies with LLVM in this case and its further optimizations.

But, good luck with whatever implementation that the ASC team uses for AS closures!

Thanks!

Dudeplayz commented 2 years ago

What's the current state of closures?

How can I workaround the following code snippet:

export function addClockEvent(cpu: CPU, callbackId: u32, cycles: u32): AVRClockEventCallback {
    return cpu.addClockEvent(() => callClockEventCallback(callbackId), cycles)
}

I'm porting a library to AS, but there is the point where a function is passed as a parameter. Because I can't pass a function to WASM, I am building a map with a callbackId that references the correct function on the JS side and calling this in an imported JS function. Without the closure I can't pass the parameter into it and modifying the function signature is not possible because it is deeply integrated.

MaxGraey commented 2 years ago

How can I workaround the following code snippet:

Just use temporary global variable:

let _callbackId: u32;

export function addClockEvent(cpu: CPU, callbackId: u32, cycles: u32): AVRClockEventCallback {
    _callbackId = callbackId;
    return cpu.addClockEvent(() => callClockEventCallback(_callbackId), cycles)
}

But this has some limitations. You can't use this approach for recursive calls and for serial calls

Dudeplayz commented 2 years ago

I will get multiple calls to this function, so the global variable would be overwriten every time it get called. Is there any other method to do it or have I misunderstood something? For me it currently looks like I have to change the callback signature which is very problematic and blocking. Having closures would be nice.

urish commented 2 years ago

Here's the approach I used in a similar situation. It replaces the callback parameter with an interface, so you can easily create classes in that implement the interface and store additional information (sort of manual closure).

interface CallbackInstance {
  execute(): void;
}

class ClockEventCallback implement CallbackInstance {
  constructor(readonly callbackId: u32) {}

  execute(): void {
    callClockEventCallback(this.callbackId); // or whatever custom login you want to do.
  }
}

Then change the signature of addClockEvent to accept a CallbackInstance and call its execute method:

  addClockEvent(callback: CallbackInstance , cycles: u32): void {
    // now store callback somewhere, and use callback.execute() to call it.
  }

Finally, call it like:

cpu.addClockEvent(new ClockEventCallback(whatever), 1000);

I hope this is helpful!

Dudeplayz commented 2 years ago

I used the approach @urish mentioned. I was hoping there is an alternative because it changes the signature of the method, which results in a changed API.

Dudeplayz commented 2 years ago

I got another error while replacing a closure with an interface, the code looks like the following:

//callReadHook and callWriteHook are imported module functions
export class ExternalCPUMemoryReadHook implements CPUMemoryReadHook {
    call(addr: u16): u8 {
        return callReadHook(addr);
    }
}

export class ExternalCPUMemoryWriteHook implements CPUMemoryHook {
    call(value: u8, oldValue: u8, addr: u16, mask: u8): boolean {
        return callWriteHook(value, oldValue, addr, mask);
    }
}

export function addReadHook(cpu: CPU, addr: u32): void {
    cpu.readHooks.set(addr, new ExternalCPUMemoryReadHook());
}

export function addWriteHook(cpu: CPU, addr: u32): void {
    cpu.writeHooks.set(addr, new ExternalCPUMemoryWriteHook());
}

// cpu.readHooks and cpu.writeHooks are of the following types:
export class CPUMemoryHooks extends Map<u32, CPUMemoryHook> {
}
export class CPUMemoryReadHooks extends Map<u32, CPUMemoryReadHook> {
}

Just the import breaks the build, removing the exported functions still crashes. Here is the error log:

asc assembly/index.ts -b build/untouched.wasm -t build/untouched.wat -d build/module.d.ts --sourceMap C://Users/xxred/IdeaProjects/avr8js-research/build/untouched.wasm.map --debug --exportRuntime

▌ Whoops, the AssemblyScript compiler has crashed during buildTSD :-( 
▌ 
▌ Here is the stack trace hinting at the problem, perhaps it's useful?
▌ 
▌ AssertionError: assertion failed
▌     at assert (C:\Users\xxred\IdeaProjects\avr8js-research\node_modules\assemblyscript\dist\webpack:\assemblyscript\std\portable\index.js:201:11)
▌     at u.visitElement (C:\Users\xxred\IdeaProjects\avr8js-research\node_modules\assemblyscript\dist\webpack:\assemblyscript\src\definitions.ts:143:16)
▌     at u.visitFile (C:\Users\xxred\IdeaProjects\avr8js-research\node_modules\assemblyscript\dist\webpack:\assemblyscript\src\definitions.ts:80:14)
▌     at u.walk (C:\Users\xxred\IdeaProjects\avr8js-research\node_modules\assemblyscript\dist\webpack:\assemblyscript\src\definitions.ts:68:65)
▌     at u.build (C:\Users\xxred\IdeaProjects\avr8js-research\node_modules\assemblyscript\dist\webpack:\assemblyscript\src\definitions.ts:625:10)
▌     at Function.build (C:\Users\xxred\IdeaProjects\avr8js-research\node_modules\assemblyscript\dist\webpack:\assemblyscript\src\definitions.ts:376:36)
▌     at Object.t.buildTSD (C:\Users\xxred\IdeaProjects\avr8js-research\node_modules\assemblyscript\dist\webpack:\assemblyscript\src\index.ts:290:21)
▌     at C:\Users\xxred\IdeaProjects\avr8js-research\node_modules\assemblyscript\cli\asc.js:1173:34
▌     at measure (C:\Users\xxred\IdeaProjects\avr8js-research\node_modules\assemblyscript\cli\asc.js:1409:3)
▌     at Object.main (C:\Users\xxred\IdeaProjects\avr8js-research\node_modules\assemblyscript\cli\asc.js:1171:27)
▌ 
▌ If it refers to the dist files, try to 'npm install source-map-support' and
▌ run again, which should then show the actual code location in the sources.
▌ 
▌ If you see where the error is, feel free to send us a pull request. If not,
▌ please let us know: https://github.com/AssemblyScript/assemblyscript/issues
▌ 
▌ Thank you!
tamusjroyce commented 2 years ago

Any reason why closures can’t be rewritten as classes & pass through constructor. As far as I can see in the ast, there should be a 1 to 1 relationship going from closures & currying to classes. For example, Typescript classes to es3 translation results in closures.

alienself commented 1 year ago

Any updates on adding closure support? What is the blocker?

luffyfly commented 1 year ago

Any updates on adding closure support? What is the problem?

luffyfly commented 1 year ago

I planned to write a game engine use AS and I have programmed for a month, but now I give up AS and choose Rust because lack of supports for closures. Just let you know, It's not reasonable to not support closures in any case.

gabriel-fallen commented 1 year ago

@luffyfly as a Functional Programming enthusiast I totally agree that closures are super useful and handy. But are you sure closures in Rust are implemented for Wasm efficient enough for your game engine?

As a general observation, in the current state Wasm presents a poor target for Functional Programming: doesn't efficiently support heavy use of function pointers, closures and GC. Moreover AssemblyScript is not really a Functional Language. And if you're programming in OO style it's not hard to implement purpose-built closures out of objects (going as far as Strategy Pattern if really needed).

luffyfly commented 1 year ago

@gabriel-fallen I'm pretty sure about that, because When I call the fetch() function of the web host from Wasm, I have to use closures as callbacks to avoid blocking the main thread.

MaxGraey commented 1 year ago

I have to use closures as callbacks to avoid blocking the main thread.

Just to note. Closures and async (cooperative multitasking) are two independent things

CryZe commented 1 year ago

doesn't efficiently support heavy use of function pointers

Closures in Rust are rarely represented as function pointers and instead as anonymous structs that have a "call" method that is statically dispatched rather than dynamically in almost all cases. The method is almost always inlined too, to the point where almost always you can't even tell it ever was a closure to begin with. It's very strongly a zero cost abstraction there.

Though I'm also wondering why you say function pointers aren't well supported by Wasm. As far as I know they are just functions stored in the function table and then called via an call_indirect and an index into the table. I don't see a problem with that approach, other than having to know all the functions that are going to be called indirectly ahead of time, but that's not a problem for a compiled language.

GC though is definitely a problem with closures for languages that don't have an ownership system.

luffyfly commented 1 year ago

@CryZe I have tried function pointers, and It worked but It is so weird and difficult to use.

luffyfly commented 1 year ago

@MaxGraey If AS has aysnc implementations, I will not take closures.

MaxGraey commented 1 year ago

@luffyfly Again closures and aysnc is absolutely different things. Dsynchrony without calling "then" in returned Promise but through "await" does not use dispatch at all. You decide what you want.

luffyfly commented 1 year ago

@MaxGraey See https://github.com/AssemblyScript/assemblyscript/issues/376#issuecomment-447260513. Actually, I want both.

butterunderflow commented 1 year ago

What is the current state of the closure implementation? I've seen several PRs related to closures, but none of them have been merged. I'm curious about what is the biggest block of closure implementation.

pchasco commented 1 year ago

I am interested in using AssemblyScript, but must admit I am reluctant to use until closures are implemented. Yes, obviously there are workarounds, but they are inconvenient and add to code bloat. All TypeScript developers whom I assume this project hopes to pilfer are accustomed to the convenience of closures and to some degree of functional programming.

Many languages that target WASM already support closures, so I am confused as to what it is about AssemblyScript specifically that prevents closures from being implemented. I would be surprised if some limitation in WASM were a real roadblock. I saw some mentions of concerns around performance and allocations, which are reasonable, but I wonder if this is one of those situations where a bit of runtime efficiency is sacrificed for developer convenience? And the implementation wouldn't necessarily be the perfect implementation out the gate, so long as it is functionally correct. It could be refined or replaced later when new WASM features are introduced, or if a better implementation is worked out.

alienself commented 5 months ago

Any update on this? Another year without closures 😢

Will 2024 the year of closures and thread support in AssemblyScript?

LeXXik commented 1 week ago

Upvoting this. Array find functions, like findIndex are pretty much useless without closures, as you can't compare an object property with some passed variable.