It doesn't seem like WebAssembly should have a GC

justinmchase commented 6 years ago

I don't understand why this feature is being proposed. It seems like wasm is an inappropriate layer to put a GC into.

It seems like the addition of a GC and memory management is a feature of the higher level runtime/language which compiles into WebAssembly, not in WebAssembly itself. What if different languages want to have different memory management strategies? What if they already have the code for a GC in C for their higher level language, will be able to compile to wasm and preserve the logic of their GC? Trying to add a one-size-fits all GC into this layer seems to complicate things for already existing runtimes which have their own GC's already.

From the spec overview:

the vast majority of modern languages need it

I will say that while this may be true true there is also a perception that the inefficiency of a GC in many modern languages is one of the biggest pain points of said language. For example the GC in .NET can cause problems with games because there is no way to control it deterministically and if it collects at the wrong time it can affect frame rate heavily. The .NET GC behaves differently on different platforms partially for these reasons and I just can't imagine a one size fits all solution here.

Furthermore it seems to me like a lot of the newest modern languages have been explicitly experimenting with alternate memory management strategies as a prominent feature. So what would it be like for languages which do not want, or cannot use the GC and Type features offered here?

I could see an optional layer on top of wasm that new language developers could opt-in to, in order to get some productivity gains perhaps but its just not making sense to me how it is appropriate for it to be at the lowest level automatically.

I'm also skeptical of the "types" I see in some examples such as:

(type $point (struct (field $x f64) (field $y f64) (field $z f64)))

Again, this just seems like an inappropriate layer to add such features. It really seems like a language specific feature and shouldn't be in wasm.

StephenHodgson commented 6 years ago

What if different languages want to have different memory management strategies?

This is a very good point.

For example the GC in .NET can cause problems with games because there is no way to control it deterministically and if it collects at the wrong time it can affect frame rate heavily.

While this is true, that's more of an issue for novice engineers that don't know how to write efficient code.

justinmchase commented 6 years ago

While this is true, that's more of an issue for novice engineers that don't know how to write efficient code.

I don't know if this is a fair statement, even an experienced engineer with detailed knowledge of the inner workings of the language and runtime can struggle with the GC and feel a lot of pain trying to work against it in a language designed such that you're supposed to pretend like its not even there.

For example this article: https://blogs.msdn.microsoft.com/shawnhar/2007/07/02/twin-paths-to-garbage-collector-nirvana/

The trick to dealing with the .NET garbage collector is... to never allocate anything so it never runs! You have to very painstakingly write your entire code base against the grain of how C# expects you to write your code specifically to deal with the non-determinism in the GC. Even if you know exactly how to write efficient code, it is costly and time consuming to fight the GC.

But anyway, this is just an example. A GC is totally fine in tons of cases but it really seems like it should be bundled as a feature with higher level languages so that different strategies can be used for different scenarios. You could then pick what worked best for your use case rather than having a 1 size fits all solution that may not work well in certain scenarios.

aardappel commented 6 years ago

@justinmchase I voiced similar concerns here: https://github.com/WebAssembly/gc/issues/30

I still think, and agree with you, that the amount of different kinds of GC implementations is almost as big as the amount of languages, and that a one size fits all system won't work.

That said, one could argue that even if the majority of languages implement their own GC in linear memory, having an available GC is "nice to have" for languages that can use it, since it will be a huge code size reduction (for the statically linked case).

One worry I'd have is a built-in GC being the preferred/simplest way to communicate with JS & the host, putting all linear memory languages (GC or not) at a disadvantage.

PaulBone commented 6 years ago

In wasm's environment wasm code (sometimes libraries or modules of larger systems) will have to interact with the DOM and with JavaScript. The most straight forward way to do that is to allow it to use the GC already written and available in the browser (JavaScript's GC).

Making the existing GC available just allows more flexability for wasm, which means more adoption and a faster web.
Making the existing GC available will mean less wasm code sent over the network (since programs that want a GC won't have to send their own.
The JS GC already exists, and might already pause your program (we work hard to avoid that), not providing one in wasm will not solve that problem.
If a GC is not provided and languages that compile to wasm have to provide their own, then it will have to interact with the DOM and JS GC to understand what is referenced and what isn't from outside, and collect cycles across the different heaps. Which means more care and attention is required by languages that compile to wasm.

RyanLamansky commented 6 years ago

Given that WASM doesn't currently allow shared-memory multi-threading or provide access to the stack pointer, it's impossible to build a GC implementation that's anywhere close to the performance of the one included with .NET and other GC-expecting code environments.

The spectre vulnerability and anxiety over similar attacks means shared-memory multi-threading isn't coming any time soon, and stack pointer access (or something like it) hasn't even been proposed, so it's safe to say we're years away from getting either of those features. Considering that, plus the JS/DOM integration possibilities offered by browser-native-integrated GC, it's worth keeping on the current path.

Don't forget that WASM GC is an opt-in feature. Programs written in C/C++/Rust won't need it, versus C#/Java which require it for conventional application designs.

xtuc commented 6 years ago

While I agree in some cases the builtin GC is not needed (C++, Rust etc). Keep in mind that once we can pass complex objects from JavaScript to WebAssembly (a DOM reference for example) someone has to collect them.

@RyanLamansky

Don't forget that WASM GC is an opt-in feature

I don't disagree, but could you please point to the spec?

sendilkumarn commented 6 years ago

(a DOM reference for example) someone has to collect them.

While I agree to the most of the points here, How much impact will that bring to JS's GC?

Don't forget that WASM GC is an opt-in feature

This is interesting (if there is a discussion / spec available somewhere)

xtuc commented 6 years ago

@sendilkumarn

How much impact will that bring to JS's GC?

Of course both GCs should work together (if not only one), but that's an implementation detail.

rossberg commented 6 years ago

@PaulBone made a number of important points. A couple of more clarifications:

It is a design goal that the use of the GC functionality in Wasm will be completely optional. No other language features is supposed to rely on it. Low-level languages are free to continue compiling to linear memory. At the 2017/07 meeting we have even discussed defining language "modules" in the Wasm spec for complex features like GC, threads, or SIMD, so that engines or embedders have a well-defined choice what to support.
Almost all modern languages (with the notable exception of Rust, which is a systems language) use GC. Especially those that you would want to use to program web apps. At the same time, most existing Wasm engines have highly sophisticated, highly tuned GCs right there. It would be ludicrous if every single app had to ship their own GC module on top of that. For example, IIRC, the GC in the mono port of .NET is about half a meg of Wasm code alone.
It is very hard to write and tune a GC, especially if you don't know which hardware you are running on. The engine has more knowledge.
Access to the Wasm stack is not needed to roll your own GC. You simply have to maintain a shadow stack of live roots in linear memory. That is not a big deal, and is essentially the same technique we already require from C compilers to handle pointers to stack-local variables.

RyanLamansky commented 6 years ago

I don't disagree, but could you please point to the spec?

This is interesting (if there is a discussion / spec available somewhere)

It's like linear memory in the MVP spec. You don't have to use it. The C/C++/Rust programs compiled to WASM today will continue to not use GC even after it's available. All we're doing here is enabling high performance GC for use cases that require it.

aardappel commented 6 years ago

@PaulBone do note that the current GC proposal is for statically typed structs, whereas the GC that already exists for JS is used to work on various representations of JS objects, which are very dynamic. So interop is not quite as simple as "just use the existing GC" (unless it was made to coincide with JS typed objects, but that doesn't help any with all the existing APIs).

In fact, though I personally think the current statically typed GC design is a better idea, an alternative design for GC would be just to expose creating & accessing JS objects. That would be simpler and provide more direct interop, but would likely be useless for statically typed languages.

@rossberg I agree every language shipping large runtimes because of GC is undesirable, but us providing a GC isn't necessarily going to change that. Many language implementors will decide to still bring their own GC because they feel they need a certain level of control we don't provide. Even if using the wasm GC could bring similar semantics and performance to a language's custom GC, there may still be reasons why it doesn't get used, because de-tangling the GC from a language's runtime may simply be too much work.

Not saying we shouldn't add GC, but we should be realistic about what problems it can solve, and for who. We should also be a good platform for GC languages (and languages using ref counts, regions, uniqueness types..) that are not using the built-in GC.

sjrd commented 6 years ago

Many language implementors will decide to still bring their own GC because they feel they need a certain level of control we don't provide.

As a data point, let me be one of the language implementors who would not bring their own GC. In fact, I wouldn't even consider compiling Scala.js to Wasm if this GC proposal was not on the table. Mostly because of @PaulBone's last bullet at https://github.com/WebAssembly/gc/issues/36#issuecomment-394154210. Without interaction between the heaps of Scala.js and JS, Scala.js as a language is pointless.

evancz commented 6 years ago

Speaking for Elm, I am facing two options:

Write a custom GC, incurring a pretty large download overhead
Use the GC with WASM, working around things that are not exactly ideal for me

I think path (2) looks much more promising, especially considering that download times are a way bigger issue than runtime perf as far as I can tell from the users I talk to.

It is possible that it will not work out ultimately, but rather than speculating that it might happen, I'm happy to talk about what I need from a GC and make sure it works for that!

justinmchase commented 6 years ago

With regards to the download size issue, its possible have multiple modules composed together with imports and exports, with browser caching is this really a problem? The GC is hardly the only significant part of a languages runtime, shouldn't they strategically break apart builds and versions into modules to take advantage of browser caching?

Rather than using this to permanently impact designs of other features wouldn't it be better to attempt to resolve this issue more directly? Such as perhaps have pre-approved, pre-cached runtimes already fetched by a browser? That could potentially reduce the size of the entire runtime for all applications.

I'll have to chew on some of the other arguments more, but I remain pretty skeptical.

PaulBone commented 6 years ago

@aardappel

@PaulBone do note that the current GC proposal is for statically typed structs, whereas the GC that already exists for JS is used to work on various representations of JS objects, which are very dynamic. So interop is not quite as simple as "just use the existing GC" (unless it was made to coincide with JS typed objects, but that doesn't help any with all the existing APIs).

That's why in SpiderMonkey we're working on JS typed objects and plan to make Wasm GC work using the same underlying code.

In fact, though I personally think the current statically typed GC design is a better idea, an alternative design for GC would be just to expose creating & accessing JS objects. That would be simpler and provide more direct interop, but would likely be useless for statically typed languages.

I'm not involved with web assembly to know what the thought is on that, but I'm fairly confident the people involved would have considered it.

But you're right that using JavaScript objects to provide GC rather than the typed object proposal wouldn't do enough to support the many many typed, GC'd languages.

PaulBone commented 6 years ago

@aardappel

@rossberg I agree every language shipping large runtimes because of GC is undesirable, but us providing a GC isn't necessarily going to change that. Many language implementors will decide to still bring their own GC because they feel they need a certain level of control we don't provide. Even if using the wasm GC could bring similar semantics and performance to a language's custom GC, there may still be reasons why it doesn't get used, because de-tangling the GC from a language's runtime may simply be too much work.

As well as a Mozilla engineer working on SpiderMonkey's GC, I'm also a language implementer considering using wasm. I havn't looked in detail but I'm fairly sure I'd be quiet happy to use wasm's GC and ship only a minimal "runtime" (probably actually a standard library since wasm provides what is normally in a runtime).

Not saying we shouldn't add GC, but we should be realistic about what problems it can solve, and for who. We should also be a good platform for GC languages (and languages using ref counts, regions, uniqueness types..) that are not using the built-in GC.

I'm fairly sure those are all better off using linear memory anyway. Since when you want so much control you use something like region based memory management, then you typically care about memory layout also, which is the kind of control linear memory (as already implemented and not going away) gives you.

titzer commented 6 years ago

I think the discussion here is missing a crucial point in that languages that bring their own GCs and compile them into the linear memory are going to end up with lifetime issues that will have to be solved by a further extension to WASM.

In the general case, the lifetime of a "bits and bytes" object in the WASM linear memory--let's call this a guest object--may depend directly on the lifetime of an embedder object--let's call this a host object--and vice versa. The only proper way to solve this lifetime co-dependency is with extensions to WASM across the embedder boundary.

We are already working on one direction for JS->WASM objects. In this case, the host object is a JS object and the guest object is, e.g. a C++ object. The WeakRefs proposal for JavaScript, and the ability to have tables with weak references, with a finalizer callback, gives the necessary functionality to allow a guest object to be reclaimed when a host object is determined unreachable by the host GC. However, it does not provide the reverse functionality. If the "root" of the lifetime lies in the guest's linear memory, then there must be a corresponding strong reference to the host object to keep it alive. This, in short, leads to memory leaks.

This is essentially the same problem as V8 and Blink face. The solution to that next problem is, in the limit, cooperative tracing across the embedder boundary. After many years of development, both V8 and Blink have GCs that cooperate to compute a combined transitive closure over their heaps. That means that a protocol exists whereby the guest cooperates in determining the liveness graph of the host objects.

This is a seriously suboptimal place to be, IMHO, because it then requires a whole new set of functionality that introduces a new contract that comes along with both latency and algorithmic problems. It makes the GC algorithm of the host observable in a very detailed way (e.g. the order that references that cross the boundary are presented). Between V8 and Blink the observability is not really a big problem, since Blink is not "random user code" from the internet. However, the latency and algorithmic problems remain, which is why the V8 and Blink teams are now pursuing the longer term goal of a unified heap.

There are other problems. For example, when we have both a host and guest memory resources, heuristics that are designed to optimize the usage of one operate poorly without knowledge of the other. E.g. the simple problem of heap sizing. V8, in particular, is quite sophisticated in how it chooses its heap sizes, since it is tasked with balancing GC latency, memory usage, and application throughput. When it has no insight into the occupancy of another region (e.g. Blink's heap, or large shared array buffers, etc), we've repeatedly seen the same problem reoccur: we cannot come up with good heuristics, and often end up thrashing and/or running out of memory. With a guest memory in WASM, this problem will only get worse.

In short, the above cooperative tracing protocol would be something to seriously avoid.

IMO we should make every effort to avoid ending up with the cooperative tracing problem, to avoid the need for an expensive and leaky abstraction that leads to a poorly controllable system overall.

Therefore I think it is imperative that we find a GC solution that works well enough for guest languages that they do not need to bring their own GCs in the general case. Everything else is untenable in the long run, as I believe that all systems will eventually converge to wanting a unified GC, though that might entail custom algorithms for subheaps.

aardappel commented 6 years ago

@titzer to clarify, if you're advocating that a good general GC is the solution (as opposed to.. registering finalizers and other "keep this object alive for me until I say so" functionality), then how does e.g. C++ use the GC to keep a host object alive? Does it allocate a wasm GC object just as a way to hold a reference to a host object? Doesn't this just shift the problem? And should a language like C++ have do deal with allocating GC-ed objects?

I think most people above appear to agree that having a good, widely applicable GC is a good thing to have. I personally estimate (YMMV) that there will be a LOT of languages that will choose to implement their own memory management (GC or otherwise) that need to be catered for, no matter how good you make the built-in GC. I don't think "you should just re-implement your language using the built-in GC" is an acceptable answer if these languages run into troubles.

As an aside, thinking further ahead, for high performance languages and use cases, I'd hope there would be ways wasm can call into the host (e.g. render this buffer of commands to the GPU) that does not involve objects and lifetime management at all, such that this becomes a non-problem.

rossberg commented 6 years ago

@aardappel, a low-level language can hold a host reference per the reference types proposal. Thus with that a JS/Web embedding of Wasm already has GC "built in", even though Wasm code cannot allocate itself -- and even though it has to jump through hoops to hold such references in its own data structures (by indirections through tables).

Language runtimes can always roll there own GC in memory, and they can do that today. But I doubt that there are many that prefer that. Most language implementers that I had a chance to talk to explicitly called out the current lack of GC as a show stopper. My attempts to convince them to roll their own have failed every time. Targeting the built-in GC is standard practice for all competing portable eco systems (JS, JVM, CLR).

aardappel commented 6 years ago

@rossberg I do not thing we can compare with e.g. the JVM.. simulating linear memory in the JVM and then building a custom GC on top of that is clearly a way too clusmy and slow to even be considered.. this is not comparable to the situation we have in wasm. On top of that, many languages have intricate runtimes in C/C++ which compile almost unmodified to wasm linear memory, and again no such facility when targetting the JVM. The JVM has plenty of languages targetting it, but for quite a few it is an "impedance mismatch", and plenty of languages are missing too. For wasm, I expect it to become a target for near 100% of languages with much less "impedance mismatch".

But ok, hard to predict the future.

TomMarius commented 6 years ago

There definitely needs to be a single shared GC. It's very ineffective to have each individual module (that compose the final application) roll and run its own GC.

aardappel commented 6 years ago

The recent slides on Go GC (https://blog.golang.org/ismmkeynote) are interesting reading for anyone who wonders if languages typically will be able to use a generic wasm GC as-is, or wether they are so specific and interwoven with the language runtime that they'll prefer to keep their own.

evancz commented 6 years ago

@aardappel, that is a very interesting talk! I am glad you shared it!

But does the Go team have any interest in running in browsers in the first place? If so, they can probably easily confirm or deny if they would have difficulty getting their runtime working with wasm GC. They can also probably give insights on whether the GC design they used for servers would make sense for a GC for user interfaces. I would expect the allocation patterns to be quite different, but they could just tell us.

Point is, if this really worries you, I think it's worth confirming with the Go people that the concerns you raise are legitimate from their perspective. They were in MTV when I was working there, so it should be easy to ask. If so, great, that information can help us find a better path.

justinmchase commented 6 years ago

I don't think its fair to assume that people targeting wasm will exclusively or even primarily be interested in running in a browser. Nodejs for example can load and run wasm and so targeting wasm for purely serverside is very compelling.

Right now nodejs ships cpp in some dependencies and if they could be compiled ahead of time into wasm and delivered as part of the package that would be a really important use case in my opinion. Any languge could be used to deploy crucial native code paths.

Here is one doing this for a native dependency that I use at work which requires a fair amount of dockerization and adds some complexity that could be removed if I had just had it in wasm instead: https://www.npmjs.com/package/expat-wasm

evancz commented 6 years ago

Lots of people want Haskell to run in browsers too. Is that the ideal language for browsers? Should that concern block any project that may help other languages? The "what about literally everybody?" angle seems to be a very strange way to look at this proposal.

But you are talking about nodejs though. Find these people. Bring them into the process and they can say whether your concerns are legitimate. Does the C++ code cause specific problems given the current specification? Have them read the current plans. Do they see a specific problem? Great! We can do a better job now.

I just don't get the theorizing here. All the people we are talking about exist and can confirm or deny these concerns. From there you can make a list of specific concerns. Maybe some can be addressed. Maybe some cannot. If this is a big deal, it should be easy to make a list of specific issues and confirm them with the relevant programmers.

aardappel commented 6 years ago

@evancz wasm is not intended to be a browser-only technology, I expect a lot of platforms to find a use for it. Also I'm also not trying to solve the specific problem of supporting Go, I'm just giving an example of the wide diversity in languages/runtimes and GC requirements that are out there.

We only ever get to design a GC for wasm once, so it be a shame if we ended up with something that goes largely unused, while also not supporting languages that need to do their own thing in linear memory well. That's a situation I'd like to avoid, or at least make sure we've considered the consequences. I don't see why it wouldn't be worth bringing in as many perspectives as possible, regardless of whether they bring up a specific concern.

justinmchase commented 6 years ago

I'm sorry, not trying to be a drag or a jerk but it seems like now is the right time to be theorizing; the design and specification stages. It seems like making a platform that is designed specifically to support "literally everybody" is the entire purpose of wasm. To open it up to a multitude of languages and features, not just javascript. The exact syntax and design of a language is just part of the equation, the memory management system is a major feature as well and therefore the appropriateness of baking only one into the lowest levels of the system should questioned at this phase.

Find these people. Bring them into the process and they can say whether your concerns are legitimate.

Shouldn't this have been done before specifying this feature? What I hear you saying is that only problems with the specific issues in the plans will be addressed but the issue brought up here isn't specific its general. The general issue is that there will be languages and platforms that differentiate themselves with different memory management strategies and there needs to be a way to solve the challenges discussed here for those languages. I know several people have chimed in here with relevant information.

I know that elm is a language that compiles into javascript and I can see completely how the creator of a language such as elm would want a javascript-less version of the javascript engine to build ontop of. But perhaps it should be a separate module from the wasm runtime just as .net or java would have. You could then easily build ontop of it just like any other platform. And perhaps the browsers can be configured to pre-cache a number of these popular runtimes automatically so when an app gets loaded they're already in there? I think there are some solutions here that seem theoretically like the right direction to go.

For example project Blazor by Microsoft: http://blog.stevensanderson.com/2018/02/06/blazor-intro/

When I load one of these blazor apps I can see in my networking tab that it is literally downloading raw .net dll's and loading them into the browser. I see mscorlib.dll in there which is where the GC lives I believe. Its got some javascript interop layer so that you can interact with the browser from within C# (or whatever .net language). Why is this not good enough? What problem does this solution have that can't be solved by just adding a little caching?

Maybe the mono team would come in and tell me exactly why they would prefer to have that built-in GC and I could have my mind changed but it seems worthwhile to take a hard look at the motivations here before it goes down a path that's hard to undo.

rossberg commented 6 years ago

We need to stay realistic, and scope our goals realistically. For one, no solution will work for every case, and second, there will always be a certain cost attached to the safety and portability guarantees that Wasm needs to provide (and has to do so without cooperation from, or knowledge of, a specific language runtime). Even for C we accept a small percentage of overhead.

Go is very special among GC-ed languages because it chose to allow interior pointers, which are unlikely to ever be equally (space-)efficient with Wasm GC -- or almost any other GC out there, for that matter. So for better or worse, Go will probably want to roll their own in any possible setting. OTOH, it's also unrealistic to assume that the same degree of performance tuning discussed in Rick's presentation is possible (or even relevant) under the constraints that a platform like Wasm needs to guarantee, built-in GC or not.

TomMarius commented 6 years ago

Remember that with the addition of WASM GC, you can still use your own implementation! You will not get some of the goodies, but it will not be worse than it is now.

julesjacobs commented 6 years ago

It is possible to support custom GC without a shadow stack with (I think) fairly minimal changes to the wasm language, and the same changes would support implementing lightweight threads/continuations. In fact it is already possible to do that right now without any changes to wasm if you can tolerate a little bit of overhead.

The paper "Accurate Garbage Collection in an Uncooperative Environment" by Henderson describes a way to scan the stack without run-time support, but it does need to memcpy the stack back and forth, and it also cannot support moving GC's. It is possible to modify the technique to remove those limitations:

The GC needs to be able to scan stack frames for roots, and a moving GC needs to modify pointers in the stack to point to the new location of the objects.

To capture the stack you throw a DoGC exception that records stack frames as it bubbles up the stack. Each function call is wrapped in an exception handler that knows what to do.

function foo(...){
   int x,y,z;
   ...
   result = f(...)
   ...
}

gets compiled to:

function foo(...){
   int x,y,z;
   ...
   try{ result = f(...) }
   catch(DoGC stack){
      stack.push(new StackFrame(foo,    // the function we're in
                                2,      // number identifying this return position
                                x,y,z)) // locals
      throw stack; // capture the stack frames that are higher up the stack
   }
  ...
}

At the bottom of the stack you have an exception handler that does GC:

try{ main() }
catch(DoGC stack){
   // scan roots of stack
   // do gc
   // restore the stack
}

Now we've got our hands on all the roots, but we've unwound the machine stack in the process, so we need a way to restore it. Fortunately all the information necessary to continue execution is saved in the stack object. For each stack frame type you add code to resume execution. For example to resume foo right after the call to f(...) with return value v for f you do:

resume_foo(stackframe, v);

resume_foo contains a copy of foo, except it takes different arguments and has a prelude to jump into the middle of the function:

function resume_foo(stackframe, v){
   int x,y,z;
   x = stackframe[0];
   y = stackframe[1];
   z = stackframe[2];
   switch(stackframe.returnposition){
     case 1: ...
     case 2: 
       result = v;
       goto resume2;
   }
   ...
   try{ result = f(...) }
   catch(DoGC stack){
      stack.push(new StackFrame(foo,    // the function we're in
                                2,      // number identifying this return position
                                x,y,z)) // locals
      throw stack; // capture the stack frames that are higher up the stack
   }
   label resume2:
  ...
}

If a moving GC wants to move an object that is pointed to by a pointer on the stack, then it can simply mutates the stackframe object before it is resumed, and the code will resume with the new pointer.

We have a way to unwind and capture the stack, and a way to resume execution of a stack frame. The overhead:

Exception handlers around every function call
All code gets duplicated for the resume_ functions

With zero cost exceptions the run-time cost of (1) is eliminated in the common case when no DoGC exception is thrown. If wasm supported a way to goto the middle of another function (or equivalently, functions with multiple entry points) then no code would have to be duplicated. resumefoo could just goto foo.resume2. Then the remaining overhead is the extra code in the catch handlers and the dispatch logic in the resume functions, and this is unavoidable. The important point is that there is no overhead during normal execution of code. Function calls and returns are just normal function calls and returns, and local variables are just local variables (no shadow stack).

The same mechanism can be used to implement lightweight threads, delimited continuations, and effect handlers with no run-time overhead in the common case (as opposed to CPS transform, which allocates at every function call). To capture a delimited continuation or effect handler you don't unwind the whole stack but simply stop at the reset point or handler. By restoring the stack frames lazily you can achieve amortised O(1) cost for effect handlers, which I don't think other implementations achieve. Lazily restoring the stack could theoretically also help GC stack scan pause times, because if a portion of the stack that was captured during the previous GC cycle has not yet been restored before the next GC cycle, then we saved the work of capturing and restoring it. This might only matter with recursive code or with Java frameworks ;-)

rossberg commented 6 years ago

@julesjacobs, unfortunately, this approach (like many others) only works in a homogeneous environment, where you know that all stack frames are yours and you have full cooperation from all functions. These assumptions do not usually hold in a Wasm environment, which can freely call back and forth between multiple languages (including the host and its API).

Also, before I see hard numbers on a Wasm engine I would reserve some skepticism about its cost advantage over a simple shadow stack. ;)

julesjacobs commented 6 years ago

Right, to be able to do GC during long running functions from other languages you'd have to save the stack before calling them.

If interoperability is a goal aren't you better off with a high level VM anyway? Or maybe that is what wasm GC is meant to be? On the other hand, to support languages efficiently you essentially need the union of their type systems.

A shadow stack slows down all function calls and returns. "Extending the UHC LLVM backend: Adding support for accurate garbage collection" benchmarks say that shadow stacks cause 60% slow down. I think that is for a non-moving GC. With a moving GC you also need to restore locals from the shadow stack after a return, which could cause further slowdown.

I think that it would be nice for a low level VM to support custom GC & lightweight threads & delimited continuations & effect handlers that don't slow down normal code. All I'm saying is that it suffices to have zero cost exceptions and multiple function entry points. Sounds like a good cost/benefit to me, but YMMV.

jdh30 commented 5 years ago

My 2p:

tl;dr: I think you're describing a very specific GC design that would work well for some languages but would seriously impede others. I suggest a high-level VM providing a GC and CLR as a separate project to WASM.

I'm looking at targeting WASM and would choose an HLVM-like GC design because it is extremely efficient for the kind of code I write. This relies upon tail calls as the only looping construct: GC tests are injected at the beginning of every function to ensure that they are always called regularly. Global roots are fat references but all references within the heap are slim. Unions are unboxed structs (so I'd like product types in WASM's type system, which would also provide multiple value return) and only cyclic backward references are pointers. HLVM's GC algorithm is very simple, non-moving and not generational. The core of the GC is ~100LOC with one autogenerated function for each monomorphic type. I'd guess download size wouldn't be an issue but I haven't measured anything. The whole thing is efficient not because the GC algorithm is efficient but because the data representation places very little stress on the GC in the first place so it doesn't need to be efficient.

The descriptions I've read here about GCs are setting off alarm bells for me. In particular, "moving", "generational", "tagged" and so on. That is a Lisp or OCaml GC for a uniform data representation where almost everything is unnecessarily boxed, allocation rates are through the roof and GC stress is off the charts. For many applications it is extremely inefficient. I don't need support for moving GCs and wouldn't want to pay an overhead for reloading pointers. I wouldn't want a generational GC and I don't want to have to pay for a write barrier or high latency nursery collections. I certainly don't want any tagging: there's just no need for it when you have static type information.

The problem is essentially that the GC design under discussion here swings entirely upon the data representation which is completely up in the air.

@julesjacobs "shadow stacks cause 60% slow down". HLVM used unoptimised shadow stacks with fat quad-word references. My measurements indicated up to 70% slowdown on one benchmark (list-based n-queens solver) but generally excellent performance everywhere else. I'm sure that 70% could be reduced a lot simply by optimising shadow stack pushes (HLVM pushed everything whether it was needed or not).

justinmchase commented 5 years ago

Interesting article here which is relevant: https://prideout.net/slides/filawasm/

Slide #8 says:

Why I love WebAssembly

No garbage collection

Simple linear memory

No dynamic types or eval

I just thought this might be a useful use case to consider.

aardappel commented 5 years ago

@justinmchase those reasons to love WebAssembly will not go away, the presence of a GC will not make them less efficient, and languages can (and in my personal opinion, should, if they can) target a non-GC subset of wasm.

aykevl commented 5 years ago

I'll give you my view from the perspective of someone who wrote an alternative Go implementation that supports WebAssembly with enormously reduced code size (kilobytes instead of megabytes). Having a built-in GC would be enormously helpful for a few reasons, often boiling down simply to code size. See #59 as well.

Without a built-in GC, you'd have to do these things, reducing performance and increasing code size:

Ship a garbage collector with the language runtime. A good garbage collector takes a lot in terms of code size, so there is always the tradeoff between GC quality and code size. A browser already has a great built-in GC so even if it may not be the perfect match for the source language semantics, it would probably be better for performance (page load time!) to use it anyway.
Implement a way to scan the stack, for example with a shadow stack. This is bad for performance, because GC root pointers need to be written to a location in linear memory. In all other popular ISAs you can get away without this by conservatively scanning the stack, but WebAssembly doesn't give access to the stack for security reasons. Therefore, you must write root pointers in the stack to a location in linear memory. A built-in GC knows where values live on the stack and it can thus easily build a stack map at load time with no added overhead.
Linear memory is fixed in size so you would have to pick the amount of needed memory beforehand (inconvenient for the developer), pick a smaller amount and suffer from performance issues when growing memory, or pick a larger amount and risk letting the browser run out of address space on 32-bit systems. Also, this amount of memory cannot shrink. Of course, a mmap equivalent in WebAssembly could solve this but it doesn't exist (yet) and I think switching to a real GC is a better solution.

gosub-com commented 5 years ago

The hidden execution stack causes problems with scanning GC roots and also implementing Goroutines. You could solve both problems by making the execution stack visible and allow it to be set to any location in memory. Switching tasks is just setting a new stack location and returning to the calling function. Scanning the stack for GC roots becomes possible.

If there's no way to do this while maintaining security and without impacting code quality (or if it's just too much work), then I have a second proposal. Implement an efficient user stack with a defined layout, including what gets put there when a function is called. Have a second instruction set that works with this stack: us.i32.const 50, us.i32.mul, etc. . I think Emscripten makes its own stack in memory, so this would probably be a quick easy way to improve code quality for all C++ programs.

cbleser commented 5 years ago

As a chip designer I wish that GC was implemented in hardware many years ago. But for cause we are stucked with old CPU architectures from Intel, AMD and ARM. Which probably would not change the next 100years or so. Old papers already have solved the GC problem in hardware via an more advanced MMU like system. The only think we know to do to day is branch prediction multi-scale pipeline, cache etc. The core architecture has not been develop much for more the 10years. I would be nice to hardware assisted GC then software developer did not have to discuss this issue of ever and we could have more robust software.

TomMarius commented 5 years ago

@cbleser could we implement GC support for WASM as RISC-V extension?

dumblob commented 5 years ago

@TomMarius @cbleser see also https://github.com/WebAssembly/gc/issues/65 .

carlsmith commented 4 years ago

Don't forget that WASM GC is an opt-in feature.

This point needs to be driven home. Code that works without a GC now will continue to work the same way once GC is available.

justinmchase commented 4 years ago

I'm going to close this as I think we talked it through pretty extensively and I have no further questions. Feel free to keep commenting here if there is anything useful to add!

jdh30 commented 4 years ago

Just re-reading this and there's one thing I'd like to add: what if WASM's GC support consisted of an API that allowed Javascript's GC to reach into the heap in WASM?

In order to traverse the heap, Javascript's GCs probably already have a function to identify global roots, a function to identify the references within a given heap-allocated block and functions to allocate and release heap-allocated blocks of memory.

WASM could allow programs to (optionally) provide an implementation of this API so the host Javascript GC would reach into the WASM heap, trace it and ask for unreachable heap blocks to be released. In particular, this allows cycles between the JS and WASM heaps to be collected without issue. Programs not wanting this would suffer no ill effects. Programs wanting this could piggyback Javascript's GC for as much as they want. Programs wanting to use their own GCs are still free to do so with no restrictions on their data representation. In addition to satisfying everyone's needs this should be relatively easy to implement.

Andreas Rossberg wrote:

It is very hard to write and tune a GC, especially if you don't know which hardware you are running on. The engine has more knowledge.

True but the impact of tuning a GC is much less than the impact of using an inefficient one-size-fits-all GC design.

the GC in the mono port of .NET is about half a meg of Wasm code alone.

Mono is an extreme example, of course. How big is OCaml's GC for comparison?

Go is very special among GC-ed languages because it chose to allow interior pointers

.NET has had byref for many years and now has Span too. Furthermore, interior pointers are demonstrably good so future GC'd languages are likely to want this too.

Wouter van Oortmerssen wrote:

Many language implementors will decide to still bring their own GC because they feel they need a certain level of control we don't provide

Precisely. Personally, the last thing I would want is a uniform data representation when I have complete static type information. The WASM GC design I last saw was an OCaml-like design that is known to leave a lot on the table.

rossberg commented 4 years ago

@jdh30, in Wasm implementations that are embedded in a JavaScript VM, like on the Web, the JS and Wasm heap will share the same GC anyway, so no API is even necessary. Better integration with embedder memory management is one of the motivations for the basic reference types proposal that we split off from the GC proposal.

aardappel commented 4 years ago

@jdh30: @kripken and myself looked into the possibility of having the engine's GC (which could be JS) coordinate with an arbitrary GC in linear memory, but the details of how to ensure cycles are collected efficiently/immediately without downsides has so far proven tricky. If you have more precise ideas on how this would work, that be interesting to hear.

And yes, freedom for language implementations to have their own memory/bit layout, make their own decisions on what is static and what is dynamic, and to be able fluently work with existing runtime code and data is very important.

Note that this need not be either/or, both JS-centric GC (current proposal) and linear memory centric GC should probably be explored in parallel, and depending on the language needs, one is going to be more suitable than the other. What I'd like to avoid is a language being stuck between two bad options: a built-in GC that is a giant impedance mismatch with the current language's implementation, and linear memory that forces each language to figure out cycles, interop etc for themselves.

There are other scenarios for which may want to start thinking about cycles and interop: interface types would allow 2 wasm modules, each of which written in a different language with their own memory management (GC or not) to form a program together. Right now interface types operate under a "share nothing" approach (copying semantics), which obviously avoids any cycle issues, but as soon as someone is going to start using some kind of id/index to refer to data of the other, these same problems will pop back up. @jgravelle-google

jdh30 commented 4 years ago

Andreas Rossberg wrote:

the JS and Wasm heap will share the same GC anyway, so no API is even necessary

Does that impose a data representation on the WASM side?

rossberg commented 4 years ago

@jdh30, Wasm just defines the ability to allocate and access tuples and arrays (and later flattened aggregations thereof) of mixed types. How exactly they are represented in the engine's heap is ultimately up to the engine. There is no way to observe it.

WebAssembly / gc

It doesn't seem like WebAssembly should have a GC #36