Block Parser: Implement generator / stream / chunked block parse interface

aduth commented 4 years ago

Previously: #7970, #9403

The current implementation of the block parser will synchronously evaluate a content string, and return an array of parsed blocks. This behavior prevents us from being able to leverage some techniques to optimize an initial rendering of the editor. For example, there could be a perceived performance improvement if we were to parse the first ~10 blocks, then render the block editor, then continue to parse the remaining (or parse the remaining only when the user interacts to scroll the block list, using some technique of infinite scroll or virtualized list rendering). Those efforts should be tracked as separate issues, but are currently blocked by the parse behavior, aimed to be explored for enhancement in this issue.

See related concepts: "Time to First Paint" and "First Meaningful Paint"

Possible alternatives to this approach could include ideas around:

Manually segmenting a string of blocks content into smaller chunks, before passing to parse. There's some redundancy here in what would need to be implemented as a "partial parse", and likely to be difficult to achieve with nested blocks.
Asynchronous callbacks for parse, to avoid any blocking, enable parsing to occur over the network or in separate worker threads

Additional challenges:

What does incremental parsing look like in the context of nested blocks?

Proposal: Introduce a new interface to the blocks parser which would allow blocks to be parsed in chunks. This could be a new block parser package, or an addition to the existing block parser which could then be used in the @wordpress/blocks parse implementation.

Today:

const blocks = parse( contentString );

_Option A (Generator):_ (* Personal recommended approach)

const blocks = Array.from( generateParse( contentString ) );

Pros:

Aligns well to the idea of an iterable set of parsed blocks data
Allows very granular control over how iteration proceeds
Concept native to the JavaScript language (unlike streams as mostly Node-specific)

Cons:

Perception of generators as being difficult to work with

_Option B (Streams)_:

const blocks = Array.from( await streamParse( contentString ) );
// Relies on experimental async iteration of streams: https://2ality.com/2019/11/nodejs-streams-async-iteration.html

Pros:

Aligns well to the idea of a streaming flow of parsed blocks data

Cons:

Streams are a relatively Node-exclusive paradigm, though there appears to be new experimental DOM-native implementations, and user-land implementations

Option C (Chunked Results):

This could go a few different ways:

Treating it as some poppable stack
A generator-like abstraction of { done: boolean, next: function }
An "instance" of a parser which produces X results at a time

const blocks = [];
const parser = createParser( contentString );
blocks.push( ...parser.take( 10 ) );
blocks.push( ...parser.take( 10 ) );

Pros:

More familiar language syntax (i.e. not a generator or a stream 😄 )
If it could be done, use the same parse function, and add a new second argument to specify the effective "per-page" of parsed blocks

Cons:

Custom implementation is redundant with existing language constructs
Unclear that it can be done to reuse existing parse function

cc @dmsnell

mcsf commented 4 years ago

Thanks for the comprehensive proposal. I'd also lean towards generators, for the reasons you cite and for the fact that it is a construct which, albeit perhaps unfamiliar in WordPress development, is here to stay and is worth learning.

dmsnell commented 4 years ago

I've played around with the idea of parse-level filtering where you could filter each block as it pushes onto the parse output. We could accomplish these different proposals using just such a method.

const myParse = doc => {
  parse( doc, {
    onParseBlock: block => {
      yield block;
    }
  } );
};

const myParse = doc => {
  const stream = new StreamOrWhatever();
  parse( doc, {
    onParseBlock: block => {
      stream.emit( 'block', block );
    }
  } );
  return stream;
};

const myParse = doc => Promise.resolve( parse( doc ) );

There's two levels of parsing of course: top-level blocks and inner blocks. The only way I can imagine using a stream/generator approach is to only send top-level blocks since we'd want to keep things in order, otherwise we'd need to have the unique IDs for each inner block communicated inside the parser.

@aduth one of the things I considered earlier when working with asynchronous parsing is the possibility of external updates. I think you pointed out that with a stream/message interface we could potentially handle that gracefully using a different message for updated blocks. Have you considered this scenario recently? For example, were we to store blocks in Simperium we could get external updates, same with WebRTC-connected collaborative sessions.

aduth commented 4 years ago

I think @epiqueras or @ellatrix might have more insight about if or how this could relate to something like collaborative editing. I can see some similarities in how, with collaborative editing, we have need to send messages describing an update to a block in a way which would be treated less as a "change resulting from user action" and more a snapshot of the "latest truth of the block value". In how that relates to parsing, such a system could allow us to iteratively build the editor's blocks state in a way which uses these signals, rather than leaning so heavily on the parser provide a specialized interface.

Edit: To clarify a specific point here:

less as a "change resulting from user action"

By this, I don't mean to imply that the messages couldn't be described as the independent changesets of specific block attributes. To me, I'm considering it more in the context of the session's undo/redo history, where currently we would encounter challenges if we tried to start adding blocks iteratively to an existing block editor state.

epiqueras commented 4 years ago

The collaborative editing bindings would sync in all available blocks from peers.

This means that if a peer has scrolled further down and parsed more blocks, the parser should not append blocks that are already there. We would also need to modify the bindings to keep tombstones of deleted blocks to make sure the parser doesn't reintroduce them.

All of this will be very complex. I'm not sure if it's worth the effort for our use cases.

Alternatively, we have discussed exposing the block tree JSON data in the REST API. If we do that, we won't have the parsing lag anywhere, and peers can start syncing immediately.

WordPress / gutenberg

Block Parser: Implement generator / stream / chunked block parse interface #19021