WordPress / gutenberg

The Block Editor project for WordPress and beyond. Plugin is available from the official repository.
https://wordpress.org/gutenberg/
Other
10.51k stars 4.21k forks source link

Block Parser: Implement generator / stream / chunked block parse interface #19021

Open aduth opened 4 years ago

aduth commented 4 years ago

Previously: #7970, #9403

The current implementation of the block parser will synchronously evaluate a content string, and return an array of parsed blocks. This behavior prevents us from being able to leverage some techniques to optimize an initial rendering of the editor. For example, there could be a perceived performance improvement if we were to parse the first ~10 blocks, then render the block editor, then continue to parse the remaining (or parse the remaining only when the user interacts to scroll the block list, using some technique of infinite scroll or virtualized list rendering). Those efforts should be tracked as separate issues, but are currently blocked by the parse behavior, aimed to be explored for enhancement in this issue.

See related concepts: "Time to First Paint" and "First Meaningful Paint"

Possible alternatives to this approach could include ideas around:

Additional challenges:

Proposal: Introduce a new interface to the blocks parser which would allow blocks to be parsed in chunks. This could be a new block parser package, or an addition to the existing block parser which could then be used in the @wordpress/blocks parse implementation.

Today:

const blocks = parse( contentString );

_Option A (Generator):_ (* Personal recommended approach)

const blocks = Array.from( generateParse( contentString ) );

Pros:

Cons:

_Option B (Streams)_:

const blocks = Array.from( await streamParse( contentString ) );
// Relies on experimental async iteration of streams: https://2ality.com/2019/11/nodejs-streams-async-iteration.html

Pros:

Cons:

Option C (Chunked Results):

This could go a few different ways:

const blocks = [];
const parser = createParser( contentString );
blocks.push( ...parser.take( 10 ) );
blocks.push( ...parser.take( 10 ) );

Pros:

Cons:

cc @dmsnell

mcsf commented 4 years ago

Thanks for the comprehensive proposal. I'd also lean towards generators, for the reasons you cite and for the fact that it is a construct which, albeit perhaps unfamiliar in WordPress development, is here to stay and is worth learning.

dmsnell commented 4 years ago

I've played around with the idea of parse-level filtering where you could filter each block as it pushes onto the parse output. We could accomplish these different proposals using just such a method.

const myParse = doc => {
  parse( doc, {
    onParseBlock: block => {
      yield block;
    }
  } );
};
const myParse = doc => {
  const stream = new StreamOrWhatever();
  parse( doc, {
    onParseBlock: block => {
      stream.emit( 'block', block );
    }
  } );
  return stream;
};
const myParse = doc => Promise.resolve( parse( doc ) );

There's two levels of parsing of course: top-level blocks and inner blocks. The only way I can imagine using a stream/generator approach is to only send top-level blocks since we'd want to keep things in order, otherwise we'd need to have the unique IDs for each inner block communicated inside the parser.

@aduth one of the things I considered earlier when working with asynchronous parsing is the possibility of external updates. I think you pointed out that with a stream/message interface we could potentially handle that gracefully using a different message for updated blocks. Have you considered this scenario recently? For example, were we to store blocks in Simperium we could get external updates, same with WebRTC-connected collaborative sessions.

aduth commented 4 years ago

I think @epiqueras or @ellatrix might have more insight about if or how this could relate to something like collaborative editing. I can see some similarities in how, with collaborative editing, we have need to send messages describing an update to a block in a way which would be treated less as a "change resulting from user action" and more a snapshot of the "latest truth of the block value". In how that relates to parsing, such a system could allow us to iteratively build the editor's blocks state in a way which uses these signals, rather than leaning so heavily on the parser provide a specialized interface.

Edit: To clarify a specific point here:

less as a "change resulting from user action"

By this, I don't mean to imply that the messages couldn't be described as the independent changesets of specific block attributes. To me, I'm considering it more in the context of the session's undo/redo history, where currently we would encounter challenges if we tried to start adding blocks iteratively to an existing block editor state.

epiqueras commented 4 years ago

The collaborative editing bindings would sync in all available blocks from peers.

This means that if a peer has scrolled further down and parsed more blocks, the parser should not append blocks that are already there. We would also need to modify the bindings to keep tombstones of deleted blocks to make sure the parser doesn't reintroduce them.

All of this will be very complex. I'm not sure if it's worth the effort for our use cases.

Alternatively, we have discussed exposing the block tree JSON data in the REST API. If we do that, we won't have the parsing lag anywhere, and peers can start syncing immediately.