AssemblyScript / assemblyscript

A TypeScript-like language for WebAssembly.
https://www.assemblyscript.org
Apache License 2.0
16.62k stars 650 forks source link

Implement closures #798

Open MaxGraey opened 4 years ago

MaxGraey commented 4 years ago

I decide to start discussion about simple closure implementations.

Some obvious (and may be naive) implementation is using generation class context which I prefer to demonstrate in following example:

declare function externalCall(a: i32, b: i32): void;

function range(a: i32, b: i32, fn: (n: i32) => void): void {
  if (a < b) {
    fn(a);
    range(a + 1, b, fn);
  }
}

export function test(n: i32): void {
  range(0, n, (i: i32) => {
    externalCall(i, n); // capture n
  });
}

which transform to:


// generated
class ClosureContext {
  fn: (ctx: usize, i: i32) => void;
  n: i32; // captured var
  // optinal "self: usize;" is closure instantiate inside instance class method
  parent: ClosureContext | null = null;
}

// generated
function lambdaFn(ctx: usize, i: i32): void {
  externalCall(i, changetype<ClosureContext>(ctx).n);
}

function range(a: i32, b: i32, ctx: usize): void {
  if (a < b) {
    changetype<ClosureContext>(ctx).fn(ctx, a); // replaced from "fn(a)";
    range(a + 1, b, ctx);
  }
}

export function test(n: i32): void {
  // insert
  let ctx = new ClosureContext();
  ctx.fn = lambdaFn;
  ctx.n = n;
  //
  range(0, n, changetype<usize>(ctx));
}

Closure and ClosureContext will not generated when no one variable was captured and use usual anonym functions. ClosureContext will not created when only this was captured. In this case ctx param use to pass this reference.

Other discussions: #563

@dcodeIO @jtenner @willemneal Let me know what you think about this?

jtenner commented 4 years ago

I think we need to override the meaning of call indirect.

jtenner commented 4 years ago

How would this transform?

function add(a, b): callback {
  return () => a + b;
}
MaxGraey commented 4 years ago

@jtenner


class ClosureContext {
  fn: (ctx: usize) => i32;
  a: i32;
  b: i32;
  parent: ClosureContext | null = null;
}

function lambdaAdd(ctx: usize): i32 {
  return changetype<ClosureContext>(ctx).a + changetype<ClosureContext>(ctx).b;
}

function add(a: i32, b: i32): ClosureContext {
  let ctx = new ClosureContext();
  ctx.fn = lambdaAdd;
  ctx.a = a;
  ctx.b = b;
  return ctx;
}
MaxGraey commented 4 years ago

Instead

class ClosureContext {
  fn: (ctx: usize) => i32;
  b: i32;
  ...
}

we could just store index for indirect function table

class ClosureContext {
  fnIdx: usize;
  a: i32;
  ...
}

... 

call_indirect(ctx.fnIdx, ...args)
jtenner commented 4 years ago

How does this work with function dispatch?

For instance, let's say I pass the Closure Context out to js as a pointer. How can I re-call it?

jtenner commented 4 years ago

Is there a way to make a table and add entries to it?

MaxGraey commented 4 years ago

@jtenner actually you need pass only fn / fnIdx. After that just use same approach as for anonymus indirect function calls.

EDIT No you need unpack fn from returned ClosureContext object. Just provide another util for loader

MaxGraey commented 4 years ago

Next example:

function test(fn: (x: i32) => void): i32 {
  let n = 0;
  fn(x => { n = x });
  return n;
}

should generate:

class ClosureContext {
  fn: (ctx: usize, x: i32) => void;
  n: i32;
  parent: ClosureContext | null = null;
}

function lambdaFn(ctx: usize, x: i32): void {
   changetype<ClosureContext>(ctx).n = x;
}

function test(fn: (x: i32) => void): i32 {
  let n = 0;
  let ctx = new ClosureContext();
  ctx.fn = lambdaFn;
  ctx.n = n;
  fn(changetype<usize>(ctx));
  n = ctx.n;
  return n;
}
jtenner commented 4 years ago

Well I'm thinking about aspect collecting function pointers. How will aspect need to work with the function pointers?

jtenner commented 4 years ago

My guess is that passing around the closure context will cause problems with manually using call_indirect like aspect does.

Also, this closure method doesn't handle multiple references to the same local.

let a = 1;
let b = () => a;
let c = () => a += 1;

B and c will get different versions of a.

MaxGraey commented 4 years ago

@jtenner You could get table and get function by index / ref. For example you have wasm:

(func $foo (result i32) (i32.const 1234))
(table (export "tbl") anyfunc (elem $foo))

than you js part:

WebAssembly.instantiateStreaming(fetch('main.wasm')).then(({ instance }) => {
  const table = instance.exports.tbl;
  console.log(table.get(0)());  // 1234
});
jtenner commented 4 years ago

It's likely we will need to allocate enclosed local values in a table or on the heap. Each pointer to those values will need to be stored in a table pointing to the heap.

This idea is Naive because the variables can no longer be treated like local variables because it's possible to modify local values before the function finishes executing.

MaxGraey commented 4 years ago

B and c will get different versions of a.

Why?

let a = 1;
let b = () => a;
let c = () => a += 1;

let br = b(); // 1
let cr = c(); // 2
// assert(a == 2)

Will generate:


class ClosureContextB { fn; a; }
class ClosureContextC { fn; a; }

function lambdaB(ctx: usize): i32 {
   return changetype<ClosureContextB>(ctx).a;
}

function lambdaC(ctx: usize): i32 {
   return changetype<ClosureContextC>(ctx).a += 1;
}

let a = 1;

let ctxB = new ClosureContextB(lambdaB, a);
let br = b(ctxB); // 1
// a = ctxB.a; // 1 unmodified so unnecessary

let ctxC = new ClosureContextB(lambdaC, a);
let cr = c(ctxC); // 2
a = ctxC.a; // 2
jtenner commented 4 years ago

I'm saying a in that example needs to exist on the heap for both b and c to access it, and the closure class needs to contain a Box<i32> that points to the heap location.

MaxGraey commented 4 years ago

No, we don't need pass plain types by boxed references. Also found pretty clear article. So ClosureContext (LexicalEnvironment) should be little bit modifier and also store reference to it's parent LexicalEnvironment.

dcodeIO commented 4 years ago

Regarding collection of closure contexts: Seems the idea here is to pass around a closure context (containing both the function index and the lexical scope) instead of just a function index. While that can be reference counted, it leads to a situation where something can be called with either a function index or a closure context, for example

function callIt(fn: () => void): void { fn(); }

function nop(): void {}
callIt(nop); // function index

let a: i32;
function ctx(): void { a = 1; }
callIt(ctx); // closure context

which means we'd either have to generate two versions of callIt (one taking a function index, one taking a closure context, otherwise doing the same) or doing only closure contexts, requiring every call to callIt to wrap function indexes in a temporary empty closure, which is an unnecessary allocation that is free'd immediately.

A better approach might be to utilize the multi-value spec, in that a closure is actually two values, a function index and a lexical environment, with the latter possibly being null.

jtenner commented 4 years ago

Yep. The issue I'm going to hit with a multivalue return is when these function pointers need to be utilized in JavaScript.

For instance, I want to call a describe function pointer that is nested. In order to do this from AssemblyScript, I need to export a function callIt that has 0 knowledge of lexical scope.

Edit: Could we have a primitive like lexscopeof(func)? That way I could at least manage it externally in javascript

MaxGraey commented 4 years ago

multi-value approach is definitely better but I'm not sure standalone VMs will support it. Its quite complicated even for compiler tools. (binaryen still not fully impement it). So having fallback (without multi-values support) it seems necessary even if not so efficient and require some allocation in heap

dcodeIO commented 4 years ago

Maybe one way to work around the allocation is to keep a singleton closure context per non-closure around in static memory. Like, if we know that the table has max 200 elements, make 200 dummies pre-populated with the function index and no lexical scope? Hmm

jtenner commented 4 years ago

Well at least 200 might not be enough. I can imagine taking advantage of this feature in aspect in very terrifying ways

willemneal commented 4 years ago

Firstly, I'd like to follow up on Daniel's example, would something like this work?

type context<T> = T extends Context | T extends Function
function callIt<context<T>>(fn: T): returnof<T> { //This function would need to change.
  if (isFunction<T>()){
    if isVoid<T>() {
     fn(); 
     return;
    }
    return fn();
  } 
  let ctx = changetype<Context>(fn);
  if (ctx.isVoid){ // Need to add a isVoid property.
     ctx.call()
     return
  }
  return ctx.call();
}

I read this article a year ago about how Elm handles first class functions in wasm: https://dev.to/briancarroll/elm-functions-in-webassembly-50ak

The big take away is that a function context is a function pointer, its current arity (or how many arguments it still has to take) and an ArrayBuffer of the arguments that have been passed.

When you have new function: let f = new Func(fn) and you the do f.call(..) You pass it the last parameter and you get a new function context which now takes one less argument. This continues until you have one argument left at which point the function pointer of the context is called. In the context of the article above, everything is immutable, which is why you get a clone of the context + the new argument. This way the intermediate functions that return functions of a smaller arity can be reused.

let add = (a: i32, b: 32): i32 => a + b;
let addOne = add(1); ==> this is now a function  `a: i32 => a + 1`;
let addTwo = add(2);
let two = addOne(1);
let three = addTwo(1);

This could look something like this:

class Func<Fn> {
   // fn: Fn; //function to be called when all arguments are present
   // airity: usize; // current airity

  get length(): usize {
    return lengthof<Fn>();
  }

  get currentArg(): usize {
    return this.length - this.arity;
  }

  constructor(public fn: Fn, public arity: usize, public args: ArrayBuffer){}

   static create<Fn>(fn: Fn, arity: usize = lengthof<Fn>(), args: new ArrayBuffer(sizeof<u64>() * lengthof<Fn>()): {
    //This isn't exactly right size the size of each parameter could vary. But for sake of simplicity let's assume they are all usize.
     //  Let's assume there is a builtin `paramType<Fn>(0)`
    type argType = paramType<Fn>(0);
    let func;
    if (arity > 1) ? paramType<Fn>(lengthof<Fn>() - arity + 1) : returnof<Fn>();
    let func = new Func<argType, Fn>(fn, arity, args);
    return func;
   }

  call<T>(arg: T): Func<Fn> | returnof<Fn>() {
    assert(arg instanceof paramType<Fn>(this.currentArg());
    if (arity == 0) { // arg is the last one and we can call the function 
      this.fn(...this.args); //This clearly needs to be a compiler operation to load the arguments as locals.  Or transform all paremeter references with offsets and pass the array/arraybuffer.
   }
   let args = this.args.clone();
   store<T>(args.dataStart + this.currentArg(), arg)
   return new Func<Fn>(fn, this.arity - 1, args);
  }
}

I know a lot of this isn't currently possible, just wanted to get the point across.

dcodeIO commented 4 years ago

Potential implementation idea:

Problem: The same function can be called with either a function table index or a closure pointer. Solution: We know that allocations have an alignment of 16 bytes, so if we make all function table indexes with that alignment invalid (essentially skipping every 16th function table index, making it the null function), a callee can check whether it is dealing with a function table index or a closure pointer by means of:

if (fn & 15) {
  call_indirect(fn, ...);
} else {
  ctx = __retain(fn);
  call_indirect(ctx.index, ...);
  __release(ctx);
}

Here, ctx is a global referencing the current closure context that must be set before a closure is being executed. A closure context is a class-like managed structure of the form:

ClosureContext extends HEADER {
  index: i32;
  closedOverVariable1: Type1;
  ...
  closedOverVariableN: TypeN;
}

Compilation:

When encountering a closed-over variable

  1. Remember that the current function is a closure
  2. For each such variable, lay these out as of ClosureContext layout in advance
  3. Replace what would be a local.get or local.set of each such variable with a load respectively store relative to current ctx.

When compiling a call to such a function

  1. Allocate the closure context and assign it to ctx
  2. Copy the function index, then each closed-over local's value to the closure context
  3. Call the function normally
  4. Copy the local values back to the caller's locals, or context if it is itself a closure
  5. Release the closure context

Performance impact:

What am I missing? :)

MaxGraey commented 4 years ago

Isn't this approach cause to table fragmentation? I predict closures can be much much more often than ordinal indirect functions

dcodeIO commented 4 years ago

It'd cause the table to be 6,25% larger than it needs to be (every 16th element is the null function), but this is independent of how many closures there are since the function index of the closure is stored normally into the table in the other 93,75%, and the closure pointer itself doesn't live in the table but in memory.

Let's say we have 20 functions in the table, with the first 10 being normal functions and the latter 10 being closures (just for the sake of this example), then the table looks like:

Index Element
0 $null
1 $normal1
2 $normal2
...
10 $normal10
11 $closure1.index
12 $closure2.index
...
15 $closure5.index
16 $null
17 $closure6.index
18 $closure7.index
...
21 $closure10.index

So, at the expense of a 6.25% larger table, we gain the ability to tell from the value representing a function (for example what Array#map is called with) if it is a normal function we can just call_indirect ((fn & 15) != 0) or a pointer to a closure in memory that we first have to unwrap ((fn & 15) == 0, use ctx.index for the call_indirect) and to refcount.

For example, if Array#map is being called with fn = 16, its code will see that this is not a valid function table index but an address in memory (to a closure), yielding code like

if (fn & 15) {
  // this is a normal function index, i.e. $normal5
  call_indirect(fn, ...);
} else {
  // this is a memory address to a closure, i.e. $closure3
  ctx = changetype<ClosureContext>(fn);
  call_indirect(ctx.index, ...);
}

with the lifetime of each closure reference being tracked by the variables retaining a reference to that closure.

function outer() {
  var a: i32 = 0;
  return function inner() { return a++; };
}
var theClosure = outer(); // closure lives as long as there is at least one ref to it
MaxGraey commented 4 years ago

Btw the most simplest optimization for closures are Lambda Lifting which looks like:

function outer(n) {
  var x = 5;
  function inner(y) {
    return x + y;
  }
  return inner(n);
}

could be transform via lifting to simple pure function

function inner(y, x) {
  return x + y;
}

function outer(n) {
  var x = 5;
  return inner(n, x);
}

and later could be optimized by binaryen to

function inner(y) {
  return 5 + y;
}

function outer(n) {
  return inner(n);
}

and finally

function outer(n) {
  return 5 + n;
}
dcodeIO commented 4 years ago

Turns out that a clean way to represent closures isn't quite enough yet. There are various corner cases with phis (closure in one branch), loops (modifying a local before we know it's captured), multiple closures in the same scope (can't alias to a single memory address) and nested closures (propagation of changes through closure contexts). Need to think about this more.

dcodeIO commented 4 years ago

Appears that the foremost building block we need there is another pass that finds out about captured locals before actually compiling the function. This way, we can allocate a single shared scope upfront when compiling a function that contains one or more closures, and make the function itself use that scope for the locals captured by any closure. A function reference would then consist of a function table index and a reference to its scope, with the function references keeping the scope alive - i.e., one level of indirection.

gabriel-fallen commented 4 years ago

I'm sorry to intervene but @MaxGraey asked if I had any ideas about implementing closures on WASM couple days ago ('coz I was thinking about them too).

@dcodeIO could you please elaborate on your last idea? "A function reference would then consist of a function table index and a reference to its scope" - when we're returning or passing around a closure, what scope we're going to pass along? The closure's scope itself or parent function's scope? And then what if a closure has another closure inside it?

It gets pretty hairy pretty quick! :smile:

If we're going to push this idea further I think we'll need a linked list of scopes to accommodate nested closures. And then something akin to de Bruijn indexes to lookup variables in the right scope. Pretty much like you do in your regular lambda-calculus interpreter (or Lisp interpreter) with lexical scoping and a linked list of environments.

On the other hand, if you're going to represent closures as a "fat pointers" (table index in this case + a pointer to environment) you can get away with just per-closure environment capturing all necessary variables from all parent scopes. Basically, the same scheme as @MaxGraey suggested initially. I'm just not sure how acceptable that from JS-interop point of view.

I hope I've said something helpful and sorry again for interfering! :)

dcodeIO commented 4 years ago

you can get away with just per-closure environment capturing all necessary variables from all parent scopes

That's the latest I had on my mind, yeah. As an example, if we have

function outer() {
  var a = 0;
  function inner() {
    var b = 0;
    a++;
    return function innerInner() {
      return (a++) + (b++);
    }
  }
  return inner;
}

there will be one lexical environment on the heap, the one containing a in outer being extended with b in inner (there can be multiple bs in different scopes, each with its own memory address).

Now, the values representing inner and innerInner would be pointers to refcounted heap allocations containing the function table index and a pointer to that environment, which we can distinguish from a plain non-closure function table index by having an alignment of 16 bytes, so these are interchangeable and any function taking a function reference can deal with both kinds by means of a potentially well-predicted if.

GC-wise, the heap allocations for each function reference would keep the environment, which is itself a refcounted heap allocation, alive, so as soon as there is no more reference to innerInner or inner, the environment would become collected. Until then, the environment keeps its contents alive.

While this potentially wastes some memory if a closure that uses an extended var is collected before another one that doesn't capture the same var, it allows us to map every capture to one exact memory location across all related closures instead of having to synchronize values between multiple environments, or manage a tree or list of sorts which would probably be costly, so seems worth it. All functions accessing any capture would replace their use of the respective local with loads and stores on the environment, even outer, though it doesn't need a function reference for itself.

Ofc there are various optimization opportunities here, with functions not capturing anything remaining plain function table indexes, or non-sharing/unrelated closures being simplified to a single heap allocation consisting of its function table index and the environment in one slab, which both should be very common in real world code.

The one level of indirection can be absorbed upon unwrapping the function index pre-call by storing the (direct) pointer to the environment in a global that the called function will use as a base for its loads and stores.

Regarding JS interop, we gain the ability to pass opaque function references around in JS (like from an export to an import), no matter if these are plain function indexes or a pointer to the managed function reference, because all receivers can deal with both. Only if JS wants to know exactly what the function index is, it needs to unwrap it if !(ref & 15). Furthermore, since we do not insert any additional parameters to closure signatures, as suggested before if a capture is read only, there is no magic happening that would break a call if not taken care of.

Still not 100% certain about this, though, since two heap allocations in the worst case seem odd. Hope that makes sense :)

gabriel-fallen commented 4 years ago

@dcodeIO yeah, I read the discussion and got your idea about exploiting address alignment to discriminate between plain indexes and closures. Neat trick. :wink:

What I suspect (I'm not sure though) you can get rid of one level of indirection, at least get read of allocating GC'd closure structure on the heap, if you always use "fat pointers" for closures and function pointers. Your "fat pointer" would consist of a table ID and a pointer to environment. Kinda "unpacked closure structure" if you use two parameters for that. Or a "packed closure structure" if we can put them both into single i64 I guess?

The downside is, basically, all the functions that get called indirectly have to take an extra parameter - a pointer to environment, even if they close nothing and thus don't need one. On the other hand, you can pass at least i32 or f32 in place of unused environment pointer I guess?

I haven't thought it through but on the surface you don't waste your table elements, especially for the programs without closures...

dcodeIO commented 4 years ago

What I suspect (I'm not sure though) you can get rid of one level of indirection, at least get read of allocating GC'd closure structure on the heap, if you always use "fat pointers" for closures and function pointers.

Potentially, yeah. In this case we'd have to consider a potential WASM64 that might hit us one day, though, and that passing i64 between JS and Wasm isn't great yet. Like, we get a BigInt there, if it's supported yet.

The downside is, basically, all the functions that get called indirectly have to take an extra parameter

This can probably be avoided by unwrapping pre-call, i.e. calling the function index contained within and setting the environment pointer on a global only used internally prior. So instead of an additional parameter the function would access that global and restore it to the previous value post-call. Should work because functions execute in one piece anyway.

MaxGraey commented 4 years ago

trick with global parameter will be thread-unsafe in multi thread environment

jtenner commented 4 years ago

@MaxGraey It's only unsafe if the global is treated mutably.

gabriel-fallen commented 4 years ago

@MaxGraey I thought WASM doesn't support multi-threading? My feeling is WASM multi-threading is further away in the future than even WASM GC...

MaxGraey commented 4 years ago

@gabriel-fallen No, wasm threads already works in Chrome under the flag for a long time. Firefox Nightly it seems also re-enabled SharedArrayBuffer. So wasm threads almost work already.

dcodeIO commented 4 years ago

trick with global parameter will be thread-unsafe in multi thread environment

But wouldn't threading imply one module instance per thread anyway? Like, if every module has its own dedicated global for that, the respective sets and gets shouldn't race.

MaxGraey commented 4 years ago

@dcodeIO yeah you're right. We could share only memory but not whole modules

dcodeIO commented 4 years ago

The fat pointer idea got me thinking. If we do like 40 bits address (~1TB) and 24 bits of additional information, like the table index (~16M), that concept would not only work well for closures but also for other reference types in that we can stuff let's say the class id into there for other references. Doing so for everything would simplify the implementation in that retain/release could work with fat pointers directly. Still wondering what kind of performance impact we'd have to expect there, especially when such values pass out of and in to Wasm as BigInts.

MaxGraey commented 4 years ago

Also we have always aligned to 16 bytes memory so we have also extra free 4 bits before least significant bit.

dcodeIO commented 4 years ago

Also, if we decide to go that route, an unfortunate side effect would be that we'd have to update the entire standard library, unwrapping fat pointers everywhere. While not a blocker, this might result in additional slight perf hits and is a lot of work, so we should first be pretty certain that it's worth it.

MaxGraey commented 4 years ago

I guess better build some bounded (simple) prototype and compare performance and usability and later decide what is better

dcodeIO commented 4 years ago

Have been looking into some basic prerequisites we need for closures with mutable captures, and so far I figured that we'll need:

First-class functions

Once function environments become decoupled from the execution stack, and stored on the heap, these must be RCed / kept alive as long as there is at least one function around using that environment. The most straight-forward way to achieve this is to represent functions with first-class objects:

abstract class Function<T> {
  readonly index: i32;
  private env: usize; // != 0 if a closure

  // ...implementations of Function#length, #toString etc...

  private __visit_impl(cookie: u32): void {
    __visit(this.context, cookie);
  }
}

The information we need here is the signature type T (i.e. to implement Function#length), the function table index index and a pointer to the bound function environment env. This is mostly identical to the path forward we worked out on the closure PR and only differs in the first-class object part. Instead of adding just a function index to the table, we'd also add a static Function<T> blob per function passed as a value.

Interestingly, it appears that this is the only breaking change here, so this can be done upfront before committing to the aspects below.

When forming a closure, we'd clone the static segment with a concrete env applied.

Compiler-generated function environment visitors

A function environment may contain references to other managed objects (in captured locals), so each function environment must keep these references alive while there is at least one function around using that environment. While we don't need source-level classes to describe them (can instead just hard-code offsets and types during codegen), we need a compiler-generated visit impl per closure (each with a unique RT id) to walk its members, so the environment along all its references can become collected, just like compiled-generated implementations used for classes. As such, once the RC of the first-class function reaches 0, the above __visit_impl is invoked, in turn calling the compiler-generated visit impl on the function's environment, in turn decrementing the RC of its members, potentially collecting them along the environment.

Recompilation of functions identified as closures and/or closure hosts

To know for sure that a function is going to capture an outer local, we'll have to compile it (here: transform it to Binaryen IR), since some branches might be statically eliminated (evaluation performed on Binaryen IR), leading to unused captures, or, potentially, functions that only are closures in certain situations.

The mechanism here might be to keep a currentFunctionEnvironment around that inner functions can populate upon compilation, increasing the memory offset within the environment and binding it to an outer local. When unrolling at the end of compileFunction, once we reach the closure host (top-most function that is not a closure but has some of its locals captured), we'll allocate the currentFunctionEnvironment for all the captures within its inner functions, including captures of inner functions again in inner functions. This greatly reduces the number of allocations we need to perform at the expense of one large environment, so instead of referencing one environment per function we are going to reference the common environment from all the first-class functions involved with a respective initial RC.

Afterwards we'd recompile the closure host and its inner closures (not necessary for non-closures), referencing the memory offset relative to the function environment's env instead of the local, on each access to a captured local, including in the closure host. Doing so eliminates the need to raise locals to the function environment or lower the function environment to locals on phis.

function outer() {
  var x = 0
  function inner() {
    x += 1
    function innerInner() {
      x += 1
    }
    innerInner()
    return x
  }
  return inner
}
let inner = outer()
let x = inner()

becomes

function outer() { // closure host
  _ctx = alloc(4, ENV_ID) // with RC = 3
  _ctx.x = 0
  let inner = { fn: function() {
    _ctx.x += 1
    let innerInner = { fn: function() {
      _ctx.x += 1
    }, ctx: _ctx }
    CALL(innerInner)
    __release(innerInner) // RC--
  }, ctx: _ctx }
  __release(_ctx) // RC--
  return inner
}
let inner = outer()
let x = CALL(inner)
__release(inner) // RC--

function CALL(closure) { // inlined per-call
  _ctx = closure.ctx
  return closure.fn()
}

Now this is mostly an exercise on an implementation strategy, and I'm currently working towards first-class function objects as a first step because we are going to need these anyway. Also note that the pseudo-code part almost looks too straight-forward to be correct, and I suspect that there's more to it than that, so if you have some time I'd appreciate any comments on what I missed. Let's focus on the correctness of the general implementation first before suggesting further optimizations (ofc there are some). Wdyt?

DuncanUszkay1 commented 4 years ago

I think the main difference between this and what we have now is just that the context is bound to the host function, rather than the closure, which I agree is better overall.

Using the same context in the function and later in the closure rather than just using locals in the host is also an improvement, since then we don't have to do any copying.

In terms of these Function<T> classes, would these be exposed to the user? Does the existence of this Function<T> class really have anything to do with mutable captures or is it just something we're going to want later? Could we not accomplish the same with a nameless struct like we do now and move to that system later?

Instead of adding just a function index to the table, we'd also add a static Function blob per function passed as a value.

Not sure what you mean here- are you referring to having one static method per function with signature T on the Function<T> class?

Also tagging @torch2424 and the mutable captures discussion here: https://github.com/DuncanUszkay1/assemblyscript/pull/11

dcodeIO commented 4 years ago

In terms of these Function classes, would these be exposed to the user?

Yes, these are analogous to Function in JS, and in the future these can also provide implementations of Function#call, Function#apply etc. where feasible.

Does the existence of this Function class really have anything to do with mutable captures or is it just something we're going to want later?

It is one of the basics, in that making these first-class objects essentially gives us a node in the graph to enable runtime integration, i.e. stuff referencing other stuff being kept alive / collected in time.

function outer() {
  let x = new Something() // must not leak or be collected while `inner` is live
  function inner() {
    trace(x)
  }
  return inner
}

Here inner is a reference to a Function<T> → referencing a compiler-generated environment (incl. visitor logic) → referencing its captured references, so when inners RC reaches 0, ARC can traverse and collect.

Could we not accomplish the same with a nameless struct like we do now and move to that system later?

Probably, but not sure how forward-compatible it is and whether it is missing something for RT integration that needs to be filled in otherwise. Can you lay out how https://github.com/AssemblyScript/assemblyscript/pull/1308 approaches RT integration and lifetimes, i.e. how does the environment keep captured references alive and guarantee to collect them?

Not sure what you mean here- are you referring to having one static method per function with signature T on the Function class?

No, just a static memory segment representing the Function<T>. Where we previously added just an index to the table, there's also be a representation of the function in memory, containing its index and env. Essentially what we recently decided upon in https://github.com/AssemblyScript/assemblyscript/pull/1308, but as a user-facing Function<T>.

dcodeIO commented 4 years ago

One challenge I see on top of the above are closures in loops with scoped captures

function outer() {
  let x = 0
  let fns = []
  for (let i = 0; i < 3; ++i) {
    let y = i
    let inner = function() {
      return x + y
    }
    fns.push(inner)
  }
  return fns
}

where we can't have a single common environment for everything, due to y being unique per closure. It seems that typically there'd be a linked list of environments there, where the inner environment containing y extends the outer environment containing x.

Env0

Key Value
parent null
x 0

EnvN (scoped)

Key Value
parent Env0
i varies

Each inner closure would have a unique EnvN with the respective captured y, referencing the parent environment Env0 with captured x. By including a pointer to the parent environment, we keep the opportunity to hard-code offsets and types into code by emitting levels of indirection through parent: y is essentially a load<i32>(envN, 4) while x is a load<i32>(load<usize>(envN, 0), 4).

DuncanUszkay1 commented 4 years ago

Can you lay out how #1308 approaches RT integration and lifetimes, i.e. how does the environment keep captured references alive and guarantee to collect them?

Sure, objects with a signature type are marked as managed objects, since they're always pointers to a closure context (the analogous object to Function<T> in this suggestion) so they hook into the runtime the same way any other memory allocated object would.

One challenge I see on top of the above are closures in loops with scoped captures

This sounds like a good thing to explicitly disallow for now. From your solution it seems like it will be possible to do this later, and creating closures in a loop doesn't seem like a terribly intuitive use case

dcodeIO commented 4 years ago

Sure, objects with a signature type are marked as managed objects, since they're always pointers to a closure context (the analogous object to Function in this suggestion) so they hook into the runtime the same way any other memory allocated object would.

I figured that much, but if a closure environment contains a reference to a managed object, let's say to an object allocated with new, how does the managed object become cleaned up when the closure environment becomes collected without hooking into the runtime with a visit impl?

function outer() {
  let x = new Something() // RC(x)=1
  function inner() {
    return x
  }
  return inner // RC(x)=2 (was 1) by forming closure
  // RC(x)=1 (was 2) by exiting outer
}
let inner = outer()
__release(inner) // RC(x)=0 (was 1) by collecting inner, via visit impl

This sounds like a good thing to explicitly disallow for now. From your solution it seems like it will be possible to do this later, and creating closures in a loop doesn't seem like a terribly intuitive use case

Not so sure about that, since Array#map etc. can easily end up being used within loops, and arguments are often closures.

gabriel-fallen commented 4 years ago

@dcodeIO I for sure missed a lot of discussion in other issues and I'm completely out of context now, sorry for that...

Could you please clarify, are you considering all function references as closures now? Otherwise I don't get how are you going to implement function calls through arguments as in

function caller(fn) { return fn(0); }

function named(p) { return p + 1; }

caller(named);

const c = 5;
caller( n => n + c );
dcodeIO commented 4 years ago

Could you please clarify, are you considering all function references as closures now?

This is considering all function references to be represented by first class function objects. A function object may or may not represent a closure, depending on the value of Function#env. Both have in common that the function table index is located at Function#index, so calling a function reference (whether a closure or not) would look like:

var someFunc = ...
someFunc(...args)

becomes

_env = someFunc.env
call_indirect(someFunc.index, args)
DuncanUszkay1 commented 4 years ago

I figured that much, but if a closure environment contains a reference to a managed object, let's say to an object allocated with new, how does the managed object become cleaned up when the closure environment becomes collected without hooking into the runtime with a visit impl?

Good point, we would need to implement something similar eventually to make that work. Hadn't considered that.

Not so sure about that, since Array#map etc. can easily end up being used within loops, and arguments are often closures.

Fair, I suppose you could hit it pretty easily. It just seems like something we could maybe punt on for an experimental first release.

Also I see you're still going by the global context route- is there a reason you're doing that instead of what we have now with the special this argument?

I ask because the global context can cause issues if you call a closure from within a closure:

function example(x: i32): i32 {
  let x = 0;
  let fn = (arg: i32): i32 => {
    let y = arg * 10;
    let otherFn = (arg1: i32): i32 => { return arg1 + y }
    otherFn();
  }
  return fn(3);
}

Becomes

function example(x: i32): i32 {
  let x = 0;
  let fn = (arg: i32): i32 => {
    let y = arg * 10;
    let otherFn = (arg1: i32): i32 => { return arg1 + _env.y }
     _env = anonymousFunc.env
    let _result = otherFn()
    return _env.x + _result; //We no longer have the proper env set
  }
  _env = example.env
  return fn(3)
}

So it seems to me like we'd need a stack of environments and swap between them if we go the global route- whereas the additional argument works as is.