Yoric commented 6 years ago

Generally, we want to make sure that all data that is required during startup should appear before data that is only required later.

So, what data do we need during startup?

The toplevel;
Functions/Methods that are executed immediately (and their nested functions that are executed recursively, etc.).

Our assumption here is that we should optimize startup speed for code that is outside of any Skippable node, and that the encoder should figure out the rest.

One way to do this would be to change the format from what we have now:

[grammar]
// All node definitions
[strings]
// All string definitions
[ast]
// Ast definitions

into

[grammar]
// Node definitions used during startup.
[strings]
// String definitions used during startup.
[ast]
// Ast definitions used during startup.
[grammar]
// Node definitions used only after startup.
[strings]
// String definitions used only after startup.
[ast]
// Ast definitions used only after startup.

Semantics

if parsing a node definition that is used during startup requires something that appears in a post-startup table, raise a SyntaxError – this does not include code hidden in a Skippable;
if executing a node during startup requires something that appears in a post-startup table (through dethunkification), raise a DelayedSyntaxError.

In either case, the encoder is in charge of deciding where to best put grammar/strings/ast definitions. This is both an optimization lever and a question of semantics.

Rationale for the second point: attempting to execute a node that depends on something that is provided in a later table means blocking the run-to-completion until we have finished received network data. This is both complicated to implement and hard to specify, as receiving network data is observable by the DOM, which could in turn trigger JS code.

Further

Ideally, we'd like to get full streaming compilation/interpretation. This may mean more than 2 levels.

[grammar]
// Node definitions used during stage 1 (startup).
[strings]
// String definitions used during stage 1 (startup).
[ast]
// Ast definitions used during stage 1 (startup).
[grammar]
// Node definitions used during stage 2.
[strings]
// String definitions used during stage 2.
[ast]
// Ast definitions used during stage 2.
[grammar]
// Node definitions used during stage 3.
[strings]
// String definitions used during stage 3.
[ast]
// Ast definitions used during stage 3.
// ...

With the definition that any lookup in a table first looks up in stage 1, then if the table of stage 1 is not long enough stage 2, ...

Again, we'll let the encoder where to best place the data. Again, we'll need to decide of semantics for errors.

syg commented 6 years ago

I think full streaming compilation is the realistic goal, see issue at https://github.com/binast/ecmascript-binary-ast/issues/12

I think streaming interpretation, given the deferred function-at-a-time error model we're going with, will end up slowing down the more widely useful streaming parsing use case.

Yoric commented 6 years ago

We agreed over in binast/ecmascript-binary-ast#12, in particular to not call "streaming interpretation" what I was calling "streaming interpretation".

Yoric commented 6 years ago

Discussing with @lukewagner, we realized that most VMs will have difficulties implementing the semantics in which we wait during execution for the loading of a function that appears further down in the stream.

So, amending

if executing a node during startup requires something that appears in a post-startup table (through dethunkification), should we raise a DelayedSyntaxError or just take the performance hit?

into

if executing a node during startup requires something that appears in a post-startup table (through dethunkification), we raise a DelayedSyntaxError.

binast / binjs-ref

Make format startup-/streaming-friendly #86

Semantics

Further