WebAssembly / gc

Branch of the spec repo scoped to discussion of GC integration in WebAssembly
https://webassembly.github.io/gc/
Other
995 stars 71 forks source link

How much should GC be influenced by JS engine design? #125

Closed tlively closed 2 years ago

tlively commented 4 years ago

One of the primary goals of the GC proposal is to enable seamless interop with JS, so it needs to be implementable in JS engines. But so far it is unclear whether the design of the GC proposal should be constrained to use mechanisms already present in JS engines or if it should allow for different styles of engines to make different object layout and optimization choices. On the one hand, the more the proposal differs from how JS engines work, the less likely it is to be acceptable to Web engines. On the other hand, many non-Web VMs would be able to take advantage of a design that makes fewer assumptions about specific implementation strategies.

One example where this problem has come up is in #119. @rossberg makes arguments based on what JS engines do:

in production engines, type descriptors, as used by the GC and for other purposes, tend to be represented as separate heap objects. For example, in existing Web engines...

While @RossTate argues that baking these assumptions into the GC design is limiting:

That is a particular implementation strategy with alternatives that language implementations choose between depending on how they expect the tradeoffs to apply to their application... All in all, there's a wide variety of implementation strategies for reified downcasting, and building in a particular one would seem to be counter to WebAssembly's design philosophy.

That's just one example, but I've noticed this disagreement causing friction in many discussions. To what extent should we consider and allow for implementation strategies not used by JS engines? Should it further be a goal for the proposal to allow for multiple implementation strategies so that engines and languages can individually make their own complexity/performance trade offs, or is ok to design around a single assumed implementation strategy because we also want portable performance?

ajklein commented 4 years ago

Practically speaking, since Web VMs are major stakeholders in WebAssembly, it seems expected and reasonable to me that their implementation concerns should influence the design of Wasm GC. Note that the exchange you quoted regards a piece of the MVP design that was explicitly influenced by feedback from V8's implementation work (see #91); in my view, this is the process working-as-intended.

While there's certainly a balance to be had between considering existing implementation constraints and hypothetical implementations, concrete implementation feedback is one of the most effective ways I've seen in the web standards world of making progress, and I wouldn't want to see implementation feedback from web VMs discounted.

tlively commented 4 years ago

Certainly JS engine feedback is extremely valuable in figuring out the details of any particular proposal, and I wouldn't want to see that feedback discounted either. This question is more about the broader discussion we've been having about alternative designs and what the goals of the GC proposal should be. The GC proposal must allow for the implementation strategies that Web engines will want to use, but it's unclear whether it should, may, or should not allow for additional standard implementation strategies as well. Our lack of consensus on this point is slowing us down.

For example, Web engines may not want to use complex pointer-tagging schemes, but a non-Web engine may have no problem using such schemes if they were good for performance and the proposal allowed toolchains to request them. In this case, should we 1) try to include the pointer-tagging schemes idea in the design (possibly post-MVP) to improve peak performance on the non-Web engine, or 2) reject the pointer-tagging schemes idea because Web engines won't use it and it makes the proposal more complex. Until we have consensus on a design principle to guide that decision, there's no way folks on either side of the issue can possibly reach an agreement.

RossTate commented 4 years ago

I have few thoughts that are more meta than concrete at this point:

  1. Certainly JS engines should influence the GC design.
  2. I think @tlively's second example better exemplifies what I'm guessing is the intended question: should we consider only JS engines, in their current form, in the design process. There are two considerations here: other WebAssembly engines for which JS is not a primary concern, and the fact that JS engines are really web engines whose implementation techniques have changed many times and whose workloads will continue to change (*crossed fingers* especially if WebAssembly does well).
  3. @tlively's first example I believe actually has the directionality backwards. I was advocating for giving the application more control over how they implement a feature. One reason is that @rossberg's proposed design is known to make many sites megamorphic when other implementation strategies enable those call sites to be monomorphic. So @rossberg's design is less amenable to engine optimizations based on hidden classes. His design would also require engines to perform concurrent-hashmap lookups in constructing these parameterized RTTs, and forcing synchronization whenever the concurrent-hashmap misses and a new "canonical" RTT has to be generated. (Also, notice that the RTT indexing that was added per #91 has disappeared from the Post-MVP. Unfortunately, this index-in-the-type strategy does not work particularly well with applications of first-class RTTs.) The other implementation strategies I was enabling all still use object descriptors, so they don't require any new tech from JS engines to benefit from them.

My interpretation of the feedback I have received on this topic is that extensibility concerns beyond what JS engines would immediately support are fine to take into consideration, but they should not come at a cost to what JS engines would currently use, and they should be considered low priority.

So focusing on @tlively's second example, i.e. application-directed pointer tagging, this feedback would suggest to me something like #119. It addresses the concerns raised in #118 at no performance cost, but unlike the SOIL Initiative's proposal it does not itself directly provide the functionality for application-directed pointer tagging. Instead, it regards pointer tagging as low priority and simply ensures there is room to add the feature at a later time if the CG ever chooses to do so.

jakobkummerow commented 4 years ago

To be clear, from a web engine's perspective, we certainly wouldn't want our current design to be set in stone: as Ross points out, we've gone through many iterations of design changes, and we definitely value the freedom to explore other implementation techniques in the future. (Also, there's no such thing as the one current web engine implementation. To use the example of pointer tagging: V8 uses Smi-tagging and Spidermonkey uses NaN-boxing, so even if we did want to be very simplistic, we couldn't just bake "the one tagging scheme that web engines use" into the WasmGC design.)

That said, current JS engines are also representatives of general high-performance, high-complexity, long-lived virtual machine implementations. When (a majority of) web engines say "it's unlikely that we'll ever implement (or benefit from) X, because while it's certainly possible in theory, in practice the resulting implementation complexity seems prohibitive", then I think chances are that any other hypothetical future non-web Wasm engine may well arrive at a very similar conclusion. (Getting back to the tagging example, I'd also like to point out that there's no contradiction between the observation "there are VMs in existence that use tagging schemes X, Y, Z and are quite happy with that", and the prediction "it's unlikely that a future VM will let user code choose the tagging scheme to use, and will support loading/executing several modules specifying different tagging schemes at the same time".) In particular, when web engines predict that they won't implement X, that does have the consequence that for many use cases, Wasm's performance in practice will not benefit from any theoretical improvements that X might unlock if implemented.

TL;DR: I agree that it's a balance.

rossberg commented 4 years ago

One example where this problem has come up is in #119. @rossberg makes arguments based on what JS engines do:

in production engines, type descriptors, as used by the GC and for other purposes, tend to be represented as separate heap objects. For example, in existing Web engines...

To be clear, I was merely mentioning Web engines as one major example of a broader observation in this quote. So it's only partially related to the discussion at hand.

Personally, I do not think that Wasm's design should be specifically optimised for JS. However, as @jakobkummerow says, JS engines are the most dominant examples of versatile, high-performance VMs these days, so should naturally inform the design on many levels.

tlively commented 4 years ago

Thank you, everyone, and sorry for the confusing initial wording of the question. This issue feels much clearer to me now.

My takeaway is that when a Web engine says it is unlikely to implement a particular optimization:

  1. That limits the number of WebAssembly users who could benefit from the GC design going out of its way to allow that optimization.
  2. That is a strong hint that other engines would not want to implement that optimization either.
  3. But it still might make sense to allow for that optimization in the design if it can be shown that there will be some benefit, for example if some other engine said that they would take advantage of it.

For design discussions, this means that we don't want to immediately discard ideas for features that enable optimizations that JS engines currently do not do. Instead, we want to understand the complexity cost of the optimization and feature and consider whether any engine, JS or otherwise, would want to implement that optimization.

tschneidereit commented 4 years ago
3\. But it still might make sense to allow for that optimization in the design if it can be shown that there will be some benefit, for example if some other engine said that they would take advantage of it.

I think this might be less clear than it superficially seems. If the optimization leads to small performance or resource usage improvements, then it seems fine to support it, but it might not be worth much additional complexity. If, however, there's a significant performance gain to be had, and web engines will just never be able to take advantage of it, then that risks a bifurcation of ecosystems that I think we should try to avoid.

It's of course the case that there will always be lots of content only targeting some specific environment, and that's fine. But the risk here is about a more fundamental split, where toolchains would just entirely target only a subset of runtimes even for modules that aren't otherwise environment specific at all.

rossberg commented 4 years ago

Agreed with @tschneidereit. I suppose another way of phrasing this is that predictable performance remains an important goal, even if it gets considerably fuzzier with something like GC.

tlively commented 4 years ago

Good point. Hypothetically, if an engine were comfortable with an extra optimization that made a big difference, ecosystem bifurcation would only be a risk if modules had to be changed to take advantage of that optimization and those changes made performance worse on other engines. We would want to make sure than any such changes to modules would not affect performance on engines that don't support the optimization.

On a side note, is this what we mean by "predictable performance?" That a change to a module should never improve performance on some engines and worsen performance on other engines? That's a much more specific definition than I've seen before, but it makes sense.

conrad-watt commented 4 years ago

On a side note, is this what we mean by "predictable performance?" That a change to a module should never improve performance on some engines and worsen performance on other engines? That's a much more specific definition than I've seen before, but it makes sense.

I don't think this an exhaustive characterisation of "predictable performance", but my understanding is that we want to avoid creating a scenario where engines implement type feedback-based "unsound" optimisations for Wasm which have the potential to deopt and create performance cliffs.

I remember hearing strong opinions that this would be an explicit failure mode for the design of GC types, but it's possible my (second-hand) perspective is out of date.

tschneidereit commented 4 years ago

Hypothetically, if an engine were comfortable with an extra optimization that made a big difference, ecosystem bifurcation would only be a risk if modules had to be changed to take advantage of that optimization and those changes made performance worse on other engines.

I'm not sure I agree with this. A large enough performance difference can make a module effectively useless in one environment, but very useful in another. In theory, we could of course have a situation where there's a design that's demonstrated to have the best properties overall, including for web engines, but also has this property. In such a situation, I guess it'd make sense to go with that design.

I'm having a hard time believing that that's a likely outcome. And I think it's more useful to focus on designs that don't have this kind of risk to begin with.

titzer commented 4 years ago

I generally agree with what @ajklein said here. Web engines are an important stakeholder and incorporating their feedback is extremely important to WebAssembly's continued success. I love the idea of custom Wasm engines and embeddings (and am happy to finally be free to work on them!), but the big iron that holds up this world is the major Web engines.

Just to be clear on some of the calculus here, though. Thomas mentioned JS engine design in the issue here and several of the side issues have to do with implementation techniques that seem like just compiler/GC things. Web engine JITs and GCs can be refactored underneath to do all kinds of neat tricks like use fat pointers or crazy tagging schemes. That's a bounded amount of work. But when those values hit the JS API surface, and those values need to flow through the rest of the JS engine (ICs, runtime calls, JS reflection, prototype chains, the whole mess), then any representation trick for Wasm values can become insanely harder because of how huge a JS runtime is. That's less an "engine" issue, but more a boundary issue and JS complexity issue, as there are literally hundreds of thousands of lines of JS runtime code vs a much much smaller amount of compiler and GC code. I feel like discussions around some techniques haven't really acknowledged this.

Basically, the upshot of the previous paragraph is that crazy tagging schemes and fat pointers are really difficult to make zero-cost in a Web VM, because JavaScript. To the extent that WebAssembly is going to offer anything like that, it is going to have to choose to work within that JS reality, or compromise on the predictable performance goal a little, as nutty values may end up being boxed if they are any more complicated than an i31ref. If we want to add complex, programmable tagging schemes that an engine has to decode and understand, the likely outcome is that web engines will absolutely cut corners and sacrifice performance for their own sanity. We then lose on both complexity and performance. We can't hold the cone of uncertainty open forever for future engines making a complicated thing fast with hypothetical heroics. In short, we need to ship simple things that are not dependent on hypothetical heroics.

Personally I think we need a layer between language runtimes and whatever GC mechanisms we develop that both solves the late-binding problem and allows for flexibility in choosing implementation strategies. Some of these problems become less pressing if layered properly. It can be done in a way that is mostly orthogonal to what is going on here, but I won't sidetrack the discussion here with that. I'll show more about what I have been up to shortly.

titzer commented 4 years ago

Hypothetically, if an engine were comfortable with an extra optimization that made a big difference, ecosystem bifurcation would only be a risk if modules had to be changed to take advantage of that optimization and those changes made performance worse on other engines.

I'm not sure I agree with this. A large enough performance difference can make a module effectively useless in one environment, but very useful in another. In theory, we could of course have a situation where there's a design that's demonstrated to have the best properties overall, including for web engines, but also has this property. In such a situation, I guess it'd make sense to go with that design.

This absolutely screams "layering" to me. If two engines have different mechanisms for implementing the same (source) thing, and modules are produced from source that target either one or the other techniques offline, then you need either conditional segments or to not do that. To not do that, you need to package up that source thing (e.g. a source type, a source operation, etc) to be lowered when the engine's supported mechanisms are known, either at link time, instantiation time, or runtime. That's a form of late-binding that can be done by a language runtime one layer up. Conditional segments only get you the ability to switch over techniques you know about now, while late binding gets you the ability to do absolutely anything in the future.

RossTate commented 4 years ago

The notion of "predictable performance" seems to me to miss some very important points: 1) everything is an abstraction, 2) many advances in this field have been made by exploiting those abstractions, 3) y'all are a bunch of competing browsers looking for ways to outdo each other, and 4) that competition is healthy and spurs innovation and adaptation.

  1. Everything is an abstraction. Even assembly level instructions are abstractions. They are often compiled down to micro-instructions. Even experienced low-level engineers have a hard time predicting which instruction sequences will perform best, and what performs best can vary by machine. One reason why video-game-console producers try to keep hardware variation across consoles minimal is so that video-game developers can essentially optimize for one precise piece of hardware, designing their data structures to align with precise cache sizes and so on. I find it unreasonable to expect that the same variation of a WebAssembly program will perform best on all engines, and to expect developers to be able to predict which variation is best for a given browser without trying them out.

  2. Recognizing assembly instructions as abstractions that could be compiled to micro-operations enabled significant improvements in hardware. At the same time, leaving them as instructions above micro-operations, rather than compiling programs straight to micro-operations, also enables the hardware to be adapted to resource constraints, changing workloads in programs, and new innovations in micro-operation design and implementation, and it enables programs to adapt to various hardware devices and to benefit from these improvements below the instruction-level abstraction. That is, pushing programs to lower-level formats does not necessarily lead to better or more predictable performance across systems. Even for a low-level instruction set, the question is always what is the right level of abstraction.

  3. As competing browsers, you will be looking for abstractions in WebAssembly you can exploit to get better performance. For example, in the current MVP it is pretty much trivial to pick out the references defining v-tables within a WebAssembly module. Once you do that, you can employ all sorts of v-table optimizations. For example, the immutability of v-tables means you can soundly apply your existing hidden-class techniques to employ speculative optimizations like guarded inlining that are known to significantly improve performance of many programs in other settings. If those optimizations indeed lead to the same performance improvements in the WebAssembly setting (after enough Java/C#/Kotlin/... programs are compiling to WebAssembly), once one of the browsers does this, the rest will be pressured to as well. And then, since WebAssembly programs cannot express guarded inlining themselves, they will start targeting this pattern to try to trigger these optimizations and reap their benefits. Was this the intended effect of the level of abstraction of the current MVP's design? No. Would web programs perform faster? Generally, yes.

  4. JavaScript programs run as well as it does because of the innovations spurred by browser competition. I eagerly await seeing how amazingly fast the innovations spurred by that same competition will make a much better abstraction like WebAssembly perform. And yes, that competition means what the optimal variant of a program is will be different for each browser. But I'd rather have that then have even my optimal program perform equally slowly on all browsers. The easiest way to ensure consistency in a world with variation is to limit everyone to the worst of all worlds.

We cannot predict performance. We cannot predict how programs and workloads will change. We cannot predict how browsers will evolve (and we certainly shouldn't expect them not to evolve). No matter what we do, the performance and implementation of WebAssembly will be an ever-changing landscape. We should not overly base our design decisions on these things we cannot predict.

(Yes, of course you can overexecute on this suggestion. Every suggestion should be just one of many considerations in any decision.)

titzer commented 4 years ago

Look, I love VMs. New and crazy optimizations are my bread and butter. I love writing compilers and garbage collectors and it's really been 20 years of fun and all that. But the hope of future optimization heroics is no excuse for a bad design. The reason VMs do heroics is because they are invariably trying to make some stupid interpreter with some bloody inefficient object model fast. That's some snark, but only barely. There is a reason why all the heavily engineered and optimized VMs are for legacy scripting languages with tons of code in the wild that cannot be changed. Optimization is what happens at the end of this whole process, when design options have been exhausted and we're stuck with something we can't fix and there are a trillion lines of code that still need to run.

Million line systems don't turn on a dime. Complexity and technical debt are real things and any calculus without them is going to make wildly wrong predictions of what is easy and what is hard, and therefore what is likely to happen in the short term versus the long term. What I mean, concretely, is that JavaScript engines are not magic cauldrons. JavaScript VMs are in the business of optimizing for JavaScript. Despite our plucky upstart here in Wasm land, there is still 10,000 times as much JavaScript code in the world. Big teams are focused, rightly, on optimizing that. Wasm optimization effort competes with JavaScript engineering effort. Wasm complexity competes with JavaScript complexity. Our demands are not challenges, but choices. Wasm effort is an investment under constant negotiation with competing concerns. It's dreaming to believe teams are going to just do us a magic rain dance one afternoon to implement the genius optimization, or some virtuous benchmark war is gonna get kicked off among competing engines. I was around for the JS benchmark wars. They sucked. V8 had to break out of a debt spiral to refocus on real world performance and language features and our technical debt from squeezing every ounce of performance for those benchmarks was real. And, by the way, the Wasm engine landscape is totally different than the JS engine landscape of ~2010.

Let's talk about "predictable performance". Many of us who spent long years working on JavaScript VMs realized the performance peaks we built and lauded ourselves over also gave rise to horrible performance cliffs jutting out of the landscape. Many here are pretty scarred by the amount of arguing we've had to do with unfortunate, frightened and helpless application programmers, and then the literal person-decades of engineering necessary to smooth out the hard cases as a long apology. Predictable performance means less arguing, less confusion, and less complexity. Predictable performance means applications and language runtimes have more agency; they can make effective decisions about what to do next if something sucks. And no, predictable performance doesn't mean the slowest common denominator. That's a strawman argument. No one is proposing the slowest possible thing. And that's because duh, slow things are exactly the thing that gives rise to both the opportunity for optimization and the subseqent performance cliffs! Slow (and complex) sucks. Fast and simple is better. Simple and slow is kind of OK, but kind of not OK.

Please read my comment again. The last part had a meaning, too. We need proper layering. We need to think carefully about what optimizations go where instead of just assuming we can stack all the smarts at the bottom. Me from 10 years ago might have thought that worked. I was more academic then. But now I see how important it is that we do not end up with a massive anchor somewhere which is the fantastical thing at the bottom that everything depends on, and yet without which everything runs horribly slow. I've wandered deep into that magical thing; it's less an oracle but more a hall of mirrors so intricate that no one could conceivably write another competitive implementation from scratch. That's a failure mode that many of us are actively designing against.

titzer commented 4 years ago

And yes, that competition means what the optimal variant of a program is will be different for each browser. But I'd rather have that then have even my optimal program perform equally slowly on all browsers. The easiest way to ensure consistency in a world with variation is to limit everyone to the worst of all worlds.

I just wanted to highlight this part because it is an excellent argument for layering and definitely not putting all the smarts in the bottom layer. It should be pretty obvious but I'll say it explicitly. It's the Wasm engine's job to adapt the Wasm to the hardware and it's the language's compiler/runtime system to adapt the program to Wasm. Wasm cannot and will not understand all languages' constructs and therefore it will need layers above it.

RossTate commented 4 years ago

@titzer I'm worried the argument I made was misunderstood. You seem to be responding to an argument that we should not rely on engine's to achieve magic for WebAssembly to achieve good performance, but I never made such an argument. I simply pointed out that, no matter how we design WebAssembly, engines will likely find innovative ways to make it perform better than what the straightforward interpretation of its instructions would indicate.

In other words, the only reason WebAssembly seems to you to avoid the wars you dread is because it is not popular enough for it to drive real browser competition, and as such the browsers have not made the effort to look for real innovations.

No one is proposing the slowest possible thing.

I have had multiple times where I have suggested we design an abstraction so that programs could more directly express their needs so that engines whose designs were well-suited to those needs could provide even better performance, and had push back that we should not do this because it would cause less "predictable performance". For example, many programs need boxed integers or doubles, and so I suggested programs be able to more directly communicate that need so that engines that support integer packing or NaN boxing could provide these "boxed" values without actual allocation. But I was told doing so would be against "predictable performance" and so should not be done. That doesn't change the needs of these programs. It just gives them the worst of all worlds in which all engines are (presumably) forced to box these values even if it would be trivial for them to not do so if the abstraction were better designed. That in turn, as you point out, prompts engines to look for more ad-hoc/hacky means to identify these patterns and optimize for them. (Also, #120 suggests that the current MVP's casting design is essentially a worst-of-all-worlds design.)

I was around for the JS benchmark wars. They sucked. V8 had to break out of a debt spiral to refocus on real world performance and language features and our technical debt from squeezing every ounce of performance for those benchmarks was real.

This is an argument that competition should not be centered around optimizing a suite of benchmarks. That's certainly true, but it's unrelated to the points I made, which apply to real-world performance.

Predictable performance means applications and language runtimes have more agency; they can make effective decisions about what to do next if something sucks.

It is control that gives runtimes more agency. But with host-managed GC, we cannot give programs direct control over how they represent their pointers or specify their object descriptors or walk the heap or use generational/conservative/copying garbage collection. That's what makes this proposal very different from the rest of WebAssembly. The best we can do is to let programs communicate their needs to the component of the system that has control, i.e. the engine, so that that component can best take advantage of its control to best serve the programs' needs. Removing options like "I need a boxed 32-bit integer" for the sake of predictable performance does not grant any agency.

titzer commented 4 years ago

You seem to be responding to an argument that we should not [sic] rely on engine's [sic] to achieve magic for WebAssembly to achieve good performance, but I never made such an argument.

When in actuality you wrote:

For example, in the current MVP it is pretty much trivial to pick out the references defining v-tables within a WebAssembly module. Once you do that, you can employ all sorts of v-table optimizations. For example, the immutability of v-tables means you can soundly apply your existing hidden-class techniques to employ speculative optimizations like guarded inlining that are known to significantly improve performance of many programs in other settings...

And also:

I eagerly await seeing how amazingly fast the innovations spurred by that same competition will make a much better abstraction like WebAssembly perform.

Forgive me for misunderstanding the part where you proposed a whole line of speculative optimizations as well as a general hope that things will magically work out as something that I should respond to in the way I did.

You now write:

It is control that gives runtimes more agency....The best we can do is to let programs communicate their needs to the component of the system that has control, i.e. the engine, so that that component can best take advantage of its control to best serve the programs' needs.

(Now I am going to respond to the words you wrote here, because you wrote them, or typed them, and I read them, and I am in good faith going to assume that you meant the words that you typed when you typed them, and then I copied them here, to prove, that you wrote them. I feel this is a regression in the level of the discussion if we have to proceed at such a low-level, but anyway...)

You cannot have predictable performance by simply offering a large selection of (complex) choices and making absolutely no guarantee about their performance in hope of future optimizations. That is not control, that is not agency, that is just a recipe for lose-lose situation where engines become complex because they have to support a large variety of options (most of which they will punt on), but are under no apparent obligation to make any of them fast. Predictable performance is a contract that inherently means limiting options, and that is absolutely necessary to combat the combinatorial explosion that leads to complex systems and technical debt, which I spent considerable time trying to explain above.

And because you ignored my comment about layering a second time, I'll restate it a third time and relate it to something you actually wrote.

Recognizing assembly instructions as abstractions that could be compiled to micro-operations enabled significant improvements in hardware. At the same time, leaving them as instructions above micro-operations, rather than compiling programs straight to micro-operations, also enables the hardware to be adapted to resource constraints, changing workloads in programs, and new innovations in micro-operation design and implementation, and it enables programs to adapt to various hardware devices and to benefit from these improvements below the instruction-level abstraction. That is, pushing programs to lower-level formats does not necessarily lead to better or more predictable performance across systems. Even for a low-level instruction set, the question is always what is the right level of abstraction.

Wonderful. Now replace s/assembly instructions/wasm/ and s/micro-operations/assembly instructions/ and out pops the concept of a Wasm engine. Do this again. Now replace s/assembly instructions/source language constructs/ and s/micro-operations/wasm/ and you are talking about a language runtime system. Do you understand what I am saying now? You need to have layers that deal with appropriate levels of abstraction. CPUs do not magically do JavaScript, because layers, and Wasm won't magically do language X, because layers. We gotta make layers work together instead of just picking one and making it responsible for the entire stack on top of it.

rossberg commented 4 years ago

Removing options like "I need a boxed 32-bit integer" for the sake of predictable performance does not grant any agency.

A boxed 32-bit integer can be trivially defined as follows:

(type $boxed-i32 (struct i32))

If you meant an unboxed 32-bit integer, then no, that cannot provide predictable performance. Because it's not representable on engines running on 32-bit hardware or using 32-bit compressed pointers on 64-bit hardware.

RossTate commented 4 years ago

Forgive me for misunderstanding the part where you proposed a whole line of speculative optimizations as well as a general hope that things will magically work out as something that I should respond to in the way I did.

I did not propose these optimizations. Others proposed them and have indicated their plans to employ these optimizations in CG meetings.

As for the quote, it simply states that I look forward to how innovation will make WebAssembly even better. There is no reference to relying on magic. Would you rather me look forward to WebAssembly implementations never improving?

And because you ignored my comment about layering a second time

You and I have discussed this, but the rest of the group hasn't. I didn't mean to offend, I just wanted to keep the conversation on topics that everyone had the same amount of context for.

(type $boxed-i32 (struct i32))

In the current MVP, this associates an identity to every reference, even if the program has no need for that identity. As such, even engines that could otherwise unbox this cannot. This is a worst-of-all-worlds solution to the program's need for boxed integers.

If you meant an unboxed 32-bit integer, then no, that cannot provide predictable performance. Because it's not representable on engines running on 32-bit hardware or using 32-bit compressed pointers on 64-bit hardware.

No one said it provides predictable performance. But its worst-case performance is the only thing the current MVP and Post-MVP supports.

rossberg commented 4 years ago

(type $boxed-i32 (struct i32))

In the current MVP, this associates an identity to every reference, even if the program has no need for that identity.

Fair enough, and there is a TODO regarding that in the MVP doc. It would be nice to address that, though it's not a showstopper or an MVP must-have either.

RossTate commented 4 years ago

Fair enough, and there is a TODO regarding that in the MVP doc. It would be nice to address that, though it's not a showstopper or an MVP must-have either.

Sweet. And once that's figured out, then engines can optimize this by unboxing the integer, meaning we will no longer have predictable performance. Instead, we will have improved performance on some engines, and the same old worst-of-all-worlds performance on other engines. That is the point I am making - once the program is able to express its needs more precisely, engines will be able to optimize for those particular needs if they choose to do so. The program does not have to rely on those "magic" optimizations to get reasonable performance, but it will benefit when those optimizations happen to be available.

In a setting where programs cannot be given direct control and instead have to work through a higher-level abstraction, the more informative that abstraction is the more opportunities (not necessities) for optimization there are. These optimization opportunities also make for more variation in program performance, but all those variations are better than performance without those optimizations. So, in a setting where programs need to use an abstraction, the goals of better performance and more predictable performance conflict with each other.

jakobkummerow commented 4 years ago

I brought up in the Requirements doc discussion that we may want to qualify what we mean by "(predictable) performance" -- as this discussion here illustrates nicely, it's not so obvious what kinds of expectations are realistic or reasonable.

My take is that aiming for "good baseline performance" captures the essence of, and is more accurate than, clinging to the notion of "predictable performance".

"Predictable performance", taken literally, is a myth. Not even machine instructions have predictable performance. I'd go as far as saying: any time a system lowers an abstraction, there is by definition some degree of freedom in how to do that, and that freedom creates differences in performance (between implementations, between different versions of the same evolving implementation, between different situations that this implementation encounters); and such differences are necessarily hard or impossible to predict.

I think one underlying desire we might agree on much more easily is that we want to avoid cases of pathologically bad performance. And as a language specification, the best way to contribute to this outcome is to design in such a way that even simple implementations will end up delivering reasonable performance. That doesn't stop advanced implementations from eventually becoming (maybe significantly) better than such a baseline -- what matters is that the baseline doesn't suck. (I realize that that's easier said than done, in part because the definition is relative.)

I think another corollary is that layering is, in a way, part of both the problem and the solution: each layer will present its own performance mysteries to the layers above it; at the same time, good layering probably gets us closer to the ideal of reasonable baseline performance, because each layer is solving a simpler problem, which is one reason why it can do a better job with less effort than a grand all-encompassing monolithic system could. (I for one am very much looking forward to what @titzer will present.)

tlively commented 4 years ago

There are a few different meanings of "optimizations" being used here, and I could have been more clear in question and examples.

In my original question, the "optimizations" I was thinking of were improvements to the expressiveness of the Wasm object model that would allow modules to provide hints about the best layouts for their objects. This increased expressiveness would certainly increase complexity and could lead to performance differences between engines that support different kinds of layouts, which would be bad for predictable performance, but only between engines.

The other kinds of optimizations we've mentioned are speculative optimizations, which also increase complexity and are bad for predictable performance within even a single engine because they create performance cliffs. I wasn't thinking of these optimizations in my original question because I was operating under the assumption that they were off the table for WebAssembly engines.

The reason VMs do heroics is because they are invariably trying to make some stupid interpreter with some bloody inefficient object model fast... Optimization is what happens at the end of this whole process, when design options have been exhausted and we're stuck with something we can't fix.

I would like to avoid having an inefficient object model and getting stuck with something we can't fix. I would also like to avoid needing Wasm engines to adopt all the heroics of JS engines. That's why I was wondering if it would make sense to have Wasm engines give modules more control over their object representations, even though Web engines are currently opinionated about their object representations.

@jakobkummerow, I have gotten the impression that you would be more ok using speculative optimizations in the GC implementation, so it would be great to hear your thoughts on this (or if that impression is wrong). I also agree that these design directions would be more clear-cut if we clarified what we meant by predictable performance.

@titzer I look forward to your proposal about late binding and layering. It sounds like an interesting approach, and I am eager to see how it could simplify or inform the GC design :)

jakobkummerow commented 4 years ago

bad for predictable performance, but only between engines.

I don't agree with the "only" part here. For one thing, engines are not static, they evolve. "That thing will work fine, but not on {client | server | OS} versions before X" is not a great state to be in. (Example: JavaScript developers are forced to use tools like Babel to transpile their code to ES5 for the benefit of those 10% of their users who are stuck on old browsers.)

Also, implementation reality will be more complicated than "engine A supports custom layouts, and engine B doesn't". Instead, for instance, maybe an engine will give you fast support for custom tagging schemes as long as there are no more than N different custom tags. Or only if you don't use the respective objects in, say, cross-module function calls. Or it could turn off more expensive compiler optimization passes on slow and/or memory-constrained (mobile) devices. The possible combinations of "feature X is supported, but only if Y / not if Z" are endless. (By "supported" here I mean: giving you the performance benefit you were hoping for.)

That said, my conclusion is not "it's even worse for predictable performance than you claim, so we can't have it", but instead "performance isn't going to be predictable no matter what we do, so there are other considerations that matter more". (Also, I'm trying to speak generically here, not to express an opinion on the specific feature example.)

you would be more ok using speculative optimizations

Oh, I love it when engines can get away with not having to do speculative optimizations: simplicity (of the implementation itself, of the mental model one needs to understand how stuff works under the hood, etc) is awesome! And I think we should try very hard to design WasmGC such that engines that don't speculate can do a very good job of delivering good performance.

That said, if Wasm-with-GC becomes as successful as I hope it will be, and if we accordingly venture into performance echelons that I hope we will reach, then pragmatically, I expect that "fancy" engine-side optimizations will sooner or later become part of the picture. That doesn't even have to be literal speculation (with potential deopts = throwing away mis-speculated code), it could just be having fast paths for some situations and slower paths for others.

Concretely, in V8, we don't do speculative optimizations for Wasm code yet, and introducing it would be a fairly big effort (and come with obvious drawbacks), which we wouldn't mind avoiding, but we suspect that at some point we will have to do it. I'd love to be proven wrong on this :-)

tlively commented 2 years ago

We've settled into a productive design process with a strong implementer feedback loop and moved to phase 2, so I'll go ahead and close this issue.