I've noticed that there is isAsync and AwaitExpression in es6.webidl, but they weren't actually part of ES6 and, correspondingly, also not supported by currently included version of the Shift parser.

They should be moved out to a separate IDL file or the file should be renamed not to include a version.

iiuc, "es6" there actually doesn't mean ES6, but ES6+. but yeah, having separate extension IDL for each version might help future update.

The toolchain should mostly support several IDLs already.

Before we use it, we'll need to solve a few issues, though:

Order inside enums and sum types. Since context encoding depends on such an order, we want language extensions such as AwaitExpression to add stuff at the end of the file. Not a big deal, but needs to be done.
We'd need to fork all the nodes affected by this issue into an es6 version without isAsync (e.g. FunctionExpression) and a post-es6 version with isAsync (e.g. FunctionWithAsyncExpression) and introduce the *WithAsync values into the relevant sum types. Actually, we could entirely get rid of isAsync and replace this with FunctionExpression/AsyncFunctionExpression, MethodExpression/AsyncMethodExpression...

Does anyone wish to work on this? Otherwise, I can take a look once we're more advanced with our current context-0.1 sprint.

talked with Yoric, and here's summary.

regarding IDL versions (and format versions), there are several things to consider.

here's the example based on async function' case:

How to extend AST (IDL)

1. just add field to existing interface

define an extension in IDL that adds field to existing interface. the resulting interface will have new field.

es6.webidl

interface EagerFunctionExpression : Node {
  attribute boolean isGenerator;
  attribute BindingIdentifier? name;
  attribute unsigned long length;
  attribute FrozenArray<Directive> directives;
  attribute FunctionExpressionContents contents;
};
interface LazyFunctionExpression : Node {
  attribute boolean isGenerator;
  attribute BindingIdentifier? name;
  attribute unsigned long length;
  attribute FrozenArray<Directive> directives;
  [Lazy] attribute FunctionExpressionContents contents;
};

typedef (EagerFunctionExpression or
         LazyFunctionExpression) FunctionExpression;

es7.webidl

[ExtendsInterfaceField=EagerFunctionExpression]
interface EagerFunctionExpression : Node {
  attribute boolean isAsync;
};
[ExtendsInterfaceField=LazyFunctionExpression]
interface LazyFunctionExpression : Node {
  attribute boolean isAsync;
};

Pros

there's only one interface in latest version
if implementation don't support older AST, this simplifies the code

Cons

the new AST is completely incompatible with older version
if implementation should support older versions, it's the spec IDL defined as a subset of extensions (might be complicated)

2. add yet another interface for new type

add interfaces that corresponds to the language extension. if that can be expressed with bool, no field will be added. (EagerFunctionExpression for sync function and EagerAsyncFunctionExpression for async function)

es7.webidl

[ExtendsTypeSum=FunctionExpression]
interface EagerAsyncFunctionExpression : EagerFunctionExpression {
};

[ExtendsTypeSum=FunctionExpression]
interface LazyAsyncFunctionExpression : LazyFunctionExpression {
};

the result of FunctionExpression sum will be:

typedef (EagerFunctionExpression or
         LazyFunctionExpression or
         EagerAsyncFunctionExpression or
         LazyAsyncFunctionExpression or) FunctionExpression;

Pros

Existing interfaces are still compatible with older AST (older AST is still valid in the new spec)
The latest IDL (with all extensions applied) represents all possible AST (old or new)

Cons

If we add multiple flags to same interface across versions (like, if generator was added later), the number of interfaces can explode

3. add yet another interface with flag

add interfaces that has flag, that corresponds to the language extension. existing interface still exist.

es7.webidl

[ExtendsTypeSum=FunctionExpression]
interface EagerFunctionExpressionES7 : EagerFunctionExpression {
  attribute boolean isAsync;
};

[ExtendsTypeSum=FunctionExpression]
interface LazyFunctionExpressionES7 : LazyFunctionExpression {
  attribute boolean isAsync;
};

the result of FunctionExpression sum:

typedef (EagerFunctionExpression or
         LazyFunctionExpression or
         EagerFunctionExpressionES7 or
         LazyFunctionExpressionES7 or) FunctionExpression;

Pros

Existing interfaces are still compatible with older AST (older AST is still valid in the new spec)
The latest IDL (with all extensions applied) represents all possible AST (old or new)

Cons

the number of interfaces may increase for each version up
- there should be some naming convention, otherwise it will be unclear which one means what
there can be multiple ways to express certain AST
- "EagerFunctionExpression" and "EagerFunctionExpressionES7 with isAsync=false" are same thing

How to extend format

in any case above, "context" format's model table becomes incompatible with older versions. this issue happens regardless of whether the encoded JS code itself uses the new syntax or not.

if we add field, the field's probability part will have extra data
if we add entry to sum, the sum's probability part will have extra data

so, things to consider:

should the format support future extension of the IDL, in several ways?
- unknown field may be added to a interface
- unknown interface may be added to sum
- type of field may be changed (maybe same thing as above)
how much does it affect the compression?
- at least current "context" format has some unused bits in model definition part, but that's temporary thing

What implementations should support

should implementation support older IDL and format?
- if server returns older version, what should the implementation do?
- should implementation request with detailed version number for supported version? (Accept header)
should implementation support unknown newer IDL and format?
- should new IDL/format somehow backward-compatible?
- if server returns newer version that implementation doesn't know, what should it do?
- should define error semantics about unknown things
  - if the format uses newer IDL but the code doesn't use it, should it run wihtout error?
  - in plain JS case, SyntaxError happens only when new unknown things is actually used in the code
should encoder support older IDL and format?
- if the target JS code doesn't use newer things (like, if it's plain ES5), should encoder use older IDL?

My take on this is that 3. add yet another interface with flag is the closest to the current behavior of JS, so we should strive for this. This assumes that the codec has a way to encode this without size/performance loss. I feel it's feasible to evolve context-0.1 in this direction.

Note that we can actually have es7.webidl

[ExtendsTypeSum=FunctionExpression]
interface EagerFunctionExpressionES7 : EagerFunctionExpression {
  attribute boolean isAsync = false;
};

[ExtendsTypeSum=FunctionExpression]
interface LazyFunctionExpressionES7 : LazyFunctionExpression {
  attribute boolean isAsync = false;
};

The = false ensures that we can automatically convert {LazyFunctionExpression, EagerFunctionExpression} to {LazyFunctionExpressionES7, EagerFunctionExpressionES7}. In practice, this means that we only write a parser for {LazyFunctionExpressionES7, EagerFunctionExpressionES7} and the auto-generated parser will (somehow) reuse these methods to generate the {LazyFunctionExpression, EagerFunctionExpression} methods.

As a compression side-note, this looks very much like a simple case of TreeRePair. Maybe looking at TreeRePair will give us another way to look at the problem.

Now, on your detailed questions, here's my take.

should implementation support older IDL and format?

Definitely. Firefox 124 cannot afford to reject files designed for Firefox 123 :)

if server returns older version, what should the implementation do?

We need to find a way to make it read the older version without over-complicating the implementation.

should implementation request with detailed version number for supported version? (Accept header)

As this is not what happens with JS nowadays, I would like to avoid this.

should implementation support unknown newer IDL and format?

I'm not sure what you mean by "newer format".

For newer IDL, I'd like to support it whenever possible. Just because the IDL and the encoder have changed, we do not want to stop supporting older browsers if we're only using older features.

   should new IDL/format somehow backward-compatible?

I'd like to.

   if server returns newer version that implementation doesn't know, what should it do?

If it's a newer version of the codec/container, we should do whatever browsers do when they receive a format they don't understand.
If it's a newer IDL but we're not using the features, we should make it work.
If it's a newer IDL using a new feature that we don't know, raise a SyntaxError.

should encoder support older IDL and format? if the target JS code doesn't use newer things (like, if it's plain ES5), should encoder use older IDL?

Good point. I think it should try.

The format prototype in fbssdc punted on grammar versioning/evolution, but that should be fixed. I think we should wait until we have data from serving this format.

Specifically, if file sizes are still a problem, I think it's likely the next iteration of the file format will want to have multiple models per field, and start to exploit some of the regularities between different kinds of fields which are barely exploited today. A lot of nice solutions for evolution available with tweaks to the simple format we have today would be invalidated by these constraints.

It's still worth thinking about the shape of the problem now though.

We need to think specifically about the kinds of changes we might encounter. I think those are:

Adding a field to an interface.
Making a field lazy, or a lazy field eager.
Adding a value to an enum.
Adding an interface to a set of alternatives. The root type Script|Module may be a special case of this.
Adding an interface.
"Deprecating" a field, interface or enum value. This is admittedly an unlikely kind of change for JavaScript.
Fundamental changes like adding a new primitive type, introducing nested arrays, or new combinations of alternatives such as array or interface, interface or primitive, etc.

We should also think about goals. For example:

If files which used new features could remain backwards compatible, adopting new features would be easier. (For example, say we added some kind of performance hint which a legacy user agent could ignore.)
It would make feature design easier if user agents could ship experimental grammar changes without causing mayhem.

Achieving all of these will be hard but we can at least understand the trade-offs we are making.

@dominiccooney I'll answer your message more completely when I have more time, just a few quick points.

I think we should wait until we have data from serving this format.

As far as I understand, Cloudflare and Facebook have different priorities here. Cloudflare prioritizes grammar compatibility.

Specifically, if file sizes are still a problem, I think it's likely the next iteration of the file format will want to have multiple models per field, and start to exploit some of the regularities between different kinds of fields which are barely exploited today.

We still need to run tests, but I don't think that file sizes are a problem at the moment, as long as we have proper support for codec dispatch. In particular, as soon as we actively work on streaming, we'll be faster pretty much regardless of the size of the file.

binast / binjs-ref

async/await is not part of ES6 #420

How to extend AST (IDL)

1. just add field to existing interface

es6.webidl

es7.webidl

Pros

Cons

2. add yet another interface for new type

es7.webidl

Pros

Cons

3. add yet another interface with flag

es7.webidl

Pros

Cons

How to extend format

What implementations should support