[feature] Optional and dependent features

tlively commented 6 years ago

Overview

Additions to the wasm spec ("features") depend on each other to varying degrees. On one end of this spectrum are features like GC and anyref, for which it is impossible to implement one (GC) without the other (anyref). At the other extreme are features like sign extension and SIMD that seem completely independent from one another, but are in fact loosely related because some toolchain logic for implementing SIMD ops is simpler in the presence of sign extension operations.

These feature dependencies matter when different engines have implemented different sets of features. When this is the case, toolchain developers and application developers need to know what kinds of feature sets exist so they can write tools and create binaries that take full advantage of the features available on their target platforms.

In a perfect, spec-compliant world, all engines would implement the full WebAssembly specification and users would only need to create one wasm binary to target all WebAssembly engines, up to ABI differences that are out of scope for this discussion. However, the world is not perfect and spec-compliant. Engines in different contexts will implement different feature sets. For example, engines designed for blockchain or embedded applications may reasonably choose not to implement threads.

Given the reality that not all engines will implement the full spec, how do we want to help toolchains and app developers reason about what kind of feature sets they need to support? Based on discussion in the 21 August 2018 CG meeting, it looks like there is already general consensus that some features such as threads should be designated "optional," though questions remain about which features qualify and what exactly that designation would mean. This issue can be broken into three questions with answers explored below.

What features should qualify as optional?

1. None

Do not allow for leaving out WebAssembly features in either normative or non-normative text. Anyone creating an engine implementing a subset of WebAssembly features will also be taking on the burden of explicitly coordinating with the maintainers of any other engine they want to be compatible with, rather than having an official specification to implement.

Pros:

Simplicity: Official tests would have to support only one configuration per platform.
Focus: The specification would describe major, full-featured implementations with nothing extra.

Cons:

Exclusivity: Niche and resource-constrained use cases where the full feature set of WebAssembly is not reasonable would no longer be officially supported.
Fragmentation: Many non-conforming implementations would arise independently, and would not necessarily be compatible with each other or use the same tools.

2. Only very large or resource-intensive features

Make features non-optional by default, making an exception only for features that cannot be reasonably implemented for use cases we explicitly want to support. For example, we want to support using WebAssembly in blockchain VMs that cannot support threads or any other source of nondeterminism. SIMD instructions may also be optional under this definition since embedded platforms and other platforms without native SIMD instructions may wish to omit them. This option leads to the question of what the criteria features would need to meet to become optional.

Pros:

Simplicity: There are only a few officially supported configurations to test.
Inclusivity: Many niche use cases would be officially supported.

Cons:

Maintenance burden: Engines have to implement new non-optional features to stay compliant.

3. Any logical grouping of features

Features default to being optional, but can still depend on each other. For example, Anyref and GC would both be optional, but to have GC, a compliant engine would also need to have Anyref. Essentially any feature added post-MVP could be optional.

Pros:

Specialization: There would be little reason not to make a spec-compliant engine for niche purposes, since it would be so easy to be spec-compliant.
Forward compatibility: Engines can maintain spec compliance without implementing new features. Lack of forward compatibility is already a problem we are hearing about.

Cons:

Complexity: Test matrices become large, and could potentially grow quickly over time. The spec document would be more complicated as well.
Fragmentation: A large diversity of implementations means more work for developers trying to target them all and a very small common feature set among compliant engines.

How should optional features and feature sets be specified?

1. In an appendix as an explicit dependency graph

Any subgraph of the dependency graph including the MVP feature set would cover an officially supported subset of features. This is equivalent to specifying a complete list of supported subsets of features. This approach would require figuring out what criteria must be met by a pair of features for a dependency to be specified between them. Does one have to be unimplementable without the other or is it enough that there is no reason not to implement one if the other is already implemented?

Pros:

Flexibility: This is the most general approach and can scale to support a large number of optional features.

Cons:

Complexity: A dependency graph is more complicated than other solutions and is unable to disambiguate any interactions between features on its own.

2. In an appendix as a change log

This simpler approach would add an append-only change log to the appendix of the spec. Any prefix of this log would be a supported subset of features, maintaining forward compatibility.

Pros:

Simplicity: This approach would require almost no additional effort from a specification point of view.

Cons:

Inflexibility: This approach doesn't really allow for truly optional features.

3. In normative text

Write the available subsets of features into the text of the spec, resolving any ambiguities about behavior in the presence or absence of any optional feature.

Pros:

Clarity: Any ambiguities about how optional features interact would be spec bugs and would be corrected.

Cons:

Scalability: Having conditional normative text might increase the size of the spec considerably and become unreadable as the number of features grow, depending on how it was written.

What feature subsets should be supported by the tools?

Clearly the tools need to support any officially blessed subset of features, but they could potentially also support unofficial feature subsets, for example if an engine with a non-conforming set of features becomes a popular platform. A major reason an implementation would want to be spec-compliant is to have this guaranteed tool support. Since this is not directly a spec question, we can leave it to be discussed elsewhere.

cretz commented 6 years ago

I vote no features are optional, or at least whether they are optional shouldn't be programmatically specified. Compiling a binary, for most practical purposes, requires knowledge of the target, and if that target doesn't implement GC or threads or whatever then the caller of the compiler should pass in those options. I think what would really cause fragmentation is legitimizing optional features.

Not implementing the entire spec should be the exception instead of the rule, and they should say so in their docs. Granted, compiler writers may choose to never make some features optional unless there are enough requests, but I take that over codifying selectable language features in spec. At the least, this approach allows y'all to punt on this question until the ecosystem is more clear about how they treat some of these features. As alluded to of course, feature dependency can get complex so just keep reasonable boundaries for now.

rossberg commented 6 years ago

This is very useful.

Feature Sets: I would have a very strong preference for only making large feature sets optional, for the reasons described. Ultimately, we will need to decide on a case-by-case basis, but of all proposals currently in progress I only see three groups that justify being made optional:

Threads
SIMD
GC

Each of them is complex enough and has severe enough implications on the language or its implementation that there are strong reasons to assume that they will not be needed or desirable -- or even possible -- in all environments.

Spec: I think the only reasonable alternative is an Appendix identifying optional "subsets" of features. With the three features above there isn't actually any dependency. However, there might be the slightly different complication of overlaps, i.e., certain constructs that exist only in the intersection of two feature sets -- consider e.g. potential "shared" types or atomic instructions for accessing GCed objects.

One consideration also is to make it easy enough for VMs or embeddings to specify what language variant they support. The fewer options the better.

Versions: There is an orthogonal axis which is the version of the standard that an implementation (fully) supports. Some of the discussion before was conflating these issues, I think. Versions are strictly linear. I don't think it's a good idea to feature-test individual new features -- that creates a large test matrix and isn't particularly useful in a low-level language like Wasm. If we assume that a new version of the standard is published every year or so, it should be fine-grained enough to distinguish on a version basis only. But we might want to think about providing an easy way to test for the smallest version that is fully supported by a VM.

rossberg commented 6 years ago

@cretz, it is going to be a fact of life that some environments won't or can't support certain features (e.g., we already know for sure that blockchain environments cannot allow threads). It would not serve anybody if the standard pretended otherwise. It's much more useful to codify agreed-upon definitions of the exact extend of these limitations.

That said, I'd want to keep the set of options as small as possible (but no smaller).

lukewagner commented 6 years ago

@rossberg On the topic of how to specify optional features, are you imagining an Appendix that is fully precise as to the exact expected behavior when a feature is absent? If not I definitely expect to see little incompatibilities creeping in at the fringes of these big features, which seems like a problem. If so, what are you imagining the specification technique to be; is it more an informal route, where there's prose to modify the formal rules of the core spec or a formal route, like a predicate that takes a module's AST?

rossberg commented 6 years ago

@lukewagner, yes, I'd prefer to be precise enough to rule out any incompatibilities. Good question how. It's hard to predict what exactly will be needed, but I'm hoping that it is enough to exclude certain constructs from the (abstract) syntax, i.e., certain instructions, types, etc. Then any binary that would decode into those is rejected. A predicate probably is the most refined way of specifying that, though perhaps an enumeration of syntax constructors already is precise enough. (Would probably be more compact and more modular to blacklist rather than whitelist.)

AndrewScheidecker commented 6 years ago

Glad to see this discussion.

I think that specifying the interaction of the optional features in normative text is really desirable. If the features are just in independent appendices, there will be under-specified interactions. The work to specify the interactions can't be avoided, only deferred.

In WAVM, I have toggle-able support for nearly all proposed extensions(importing mutable globals, simd, atomics, exception handling, non-trapping float-to-int, sign-extension ops, multi-value, bulk-memory-operations, and reference-types). That's more optionality than we're talking about for the spec, and the code that checks feature flags is isolated to the validation layer.

Would it be practical to use that approach in the specification? i.e. include all features in the structure, execution, text, and binary sections of the spec, and only specify optional subsets in the validation section?

rossberg commented 6 years ago

I don't think you want to clutter validation with this in the spec. It should be possible to nicely decouple it by just defining what syntax is (dis)allowed. That can be accurately specified independent from everything else.

7ombie commented 2 years ago

This is another example of why the Web should have its own standards, and not have to worry about crypto etc.

WebAssembly / spec