WordPress / gutenberg

The Block Editor project for WordPress and beyond. Plugin is available from the official repository.
https://wordpress.org/gutenberg/
Other
10.51k stars 4.2k forks source link

Classification of block validation types #21703

Open mtias opened 4 years ago

mtias commented 4 years ago

This proposal seeks to build upon the existing block validation system to start being more proactive in reducing the cases where the final user is faced with an invalid block type dialog. This is a proposal that involves manipulating data on behalf of the user so it needs both a solid definition system (in steps) to ground the level of inference the engine is willing to do to restore content integrity.

Block Validation Flow

Block validation, as defined here, comes after the initial parsing step which yields block types and checks whether a corresponding block type handler exists (meaning, that a corresponding block type is registered in the system). That phase is still objectively part of the validation process but is not the focus of this issue since there are no further feasible optimization to do there regarding content reconstruction. What this one focuses on is what happens when the block source is run through its save function, producing a new source.

// A block source is run through the `save` function of its `blockType`
blockType.save( source ) => newSource;

// The resulting operation is classified for every block given the following outcomes
block.isValid: Number;

The proposal here focuses on classifying the outcomes of the above operation by logically grouping the possible scenarios in a decreasing order of certainty over content integrity.

ValidBlock: 0            // idempotent operation of `save(source) => source`.

MigratedBlock: 1         // source is matched sequentially with defined deprecations
                         // until it produces a match.

PreservedSource: 2       // `newSource` produces equivalent `innerHtml` even if
                         //  comment attributes differ and becomes idempotent
                         // after first reconciliation.

ReconstructedSource: 3   // `newSource` contains the same attributes as `source`
                         // (attribute integrity), including non-empty sourced
                         // attributes, while `innerHtml` is allowed to be rebuilt.

RawTransformedSource: 4  // source is passed to raw handling functions and it yields
                         // the same block type.

InvalidBlock: 5          // the block could not be safely restored, need user input

This operation is run for all blocks and is the cheapest mechanism to ensure integrity. It has to be optimized for speed. This has already served us super well as a quick heuristic.

When the basic match operation in level 0 fails, the first path is to check whether deprecated shapes exist for the block. If they do, we run the source against each of those sequentially until we find a match. This is also the step in which source can be migrated to newer sources. It has been in place for a long time and has allowed transparently upgrading the shape of a blocks numerous times.

Starting at this level the block validation mechanism has been unable to reconcile the source with the output, which means there's a problem it doesn't have explicit instructions on how to handle. From here on there's potential of some data loss and is the main point of discussion.

For this level 2 we'd suspend the integrity of comment attributes in favor of the integrity of the source, provided the innerHtml of source and the innerHtml of the output match. It's also important that we achieve idempotency immediately after reconciling the comment attributes.

// Example
// - source
<!-- wp:heading {"level":3} -->
<h2>Testing Header</h2>
<!-- /wp:heading -->

// - output
<!-- wp:heading -->
<h2>Testing Header</h2>
<!-- /wp:heading -->

// inner html matches both instances

This level can be seen as "clean up spurious comment attributes malformations". Worth pointing out that this relies on the fact the block author has made a choice for us regarding what matters for sourcing the content attributes, since it is prioritizing the h2 content tag and not the comment attribute "level".

In this variant, the html comments coincide but the inner html doesn't. If there is attribute integrity in oldSource and newSource (and sourced attributes are not empty) we let the block output be overwritten as the result of computing save again. This is an indication the block author leans on the comment attribute as the source of truth rather than the html source, and we honor that decision.

// Example
// - source
<!-- wp:heading {"level":6,"textColor":"pale-pink"} -->
<h6>Testing Header</h6>
<!-- /wp:heading -->

// - output
<!-- wp:heading {"level":6,"textColor":"pale-pink"} -->
<h6 class="has-pale-pink-color has-text-color">Testing Header</h6>
<!-- /wp:heading -->

// restores generated class names

This step can be seen as an "implicit migration" for indefinite forms, in contrast with those defined in deprecations.

At this level we have lost confidence in the block functions being able to reconstruct a source so we hand over the operation to the raw handling mechanism (this is currently exposed to users in the "convert to blocks" action on invalid block types).

What we seek here is to ensure the resulting blockType.name is actually a match between source and output. This is the most aggressive manipulation since it doesn't rely directly on specific instructions from a block's save method but it can employ its raw transformations (as if content was being pasted or was part of a freeform block "convert to blocks" operation).

// Example : where it succeeds restoring block type
// - source
<!-- wp:heading -->
<h2>Testing Header</p>
<!-- /wp:heading -->

// - output
<!-- wp:heading -->
<h2>Testing Header<p></p></h2>
<!-- /wp:heading -->

And here's an example where it currently fails to restore the block type.

// Example : where it fails restoring block type
// - source
<!-- wp:heading -->
<span>Testing Header</h2>
<!-- /wp:heading -->

// - output
<!-- wp:paragraph -->
<p><span>Testing Header</span></p>
<!-- /wp:paragraph -->

As noted, in the last example the wp:heading is not preserved so we would not consider it passes level 4.

User Experience

If we keep the type of invalidation that occurred stored in the block object, it would allow us to build UI and behaviour around it. For example, we might choose to show somewhere in the interface (block inspector or block toolbar) that a transformation above level 1 has taken place, so the user could review it if they see something amiss and take a different choice than what the system has done.

There's obviously also room to work upon the save( source ) => newSource operation to discard or omit certain elements we might consider less strict (like class names or other html attributes in the wrapper elements) which would cascade through all the levels and likely prevent going from one level to another for some cases.


Interested in hearing your thoughts on this. Do the classifications make sense? Is the order what you'd expect? Is there another vector we might choose to classify as well?

There might be a chance to have another classification after level 3, for example, where we attempt the reconstruction of both the comment and the inner html provided the block type name remains the same before passing on to pure raw handling mechanisms.

cc @aduth @mcsf @youknowriad @dmsnell

youknowriad commented 4 years ago

Interesting here :) what's unclear to me is:

mtias commented 4 years ago

My inclination would be to not expose this directly but only indirectly: adding deprecations is an indirect way of influencing what is valid / invalid; adding transformations / migrations is a way of opting in into raw handling scenarios, etc.

dmsnell commented 4 years ago

Thanks @mtias for sharing! I love the importance of handling "invalid blocks"

One immediate thought that sticks out is that we might want to reverse the numbering or change the names. Regardless of concepts here the proposal sets a valid block to isValid: 0 which would lead to confusing code like this and I'm sure a slurry of bugs due to JS's weak typing being happy to compare numbers with booleans…

if ( block.isValid ) {
    // block is invalid?
}

A reframing could be a more passive property of the block:

if ( block.decay ) {
    // something was "healed" in this block
}

As a classification system I like this delineation - it's helpful for understanding how to handle the different types of blocks. I believe we still have some cases missing though:


In most of these cases I feel like the bigger value is in handling this as an editor framework rather than in exposing it to the blocks, though I see not problem in exposing it.

If we could show two panels next to each other as a before/after and let the author choose which to keep then we could use these classifications to give the context necessary to help decide.

aduth commented 4 years ago

I like the general idea here. I think it gets most interesting at what's described as Stage 3 and above, in having a systematic way to categorize the extent of invalidation in a way that can be used to present the user with options. I seem to recall at some point in the past there was some experimentation in the idea of considering block attribute values for validation, disregarding whatever the HTML might be. It clearly has the potential for being destructive to markup, which is why it wasn't ever implemented. But if it's something which can be presented as an option to the user, it might be more appetizing to consider.

At least coming from my own understanding of the background of the current validation procedure, I have some confusion or concern for what is considered as proposal of Stages 0 through 2.

With regards to the original topic #7604, one of the things I've been thinking about over time is whether the framework can be doing more to be aware of differences in a way which they can be reapplied to resulting markup without intervention. This already exists to an extent with class names ("custom class name"). If a class attribute is detected in the markup of the top-level element of a block which is not expected, it is "held" and then reapplied to the resulting save. Without this, the additional class would be considered as invalidating. I've contemplated whether this could be expanded to behave the same way for additional attributes, even without exposing it to the user directly.

My hunch is that a lot of the invalidations we'll see are these sorts of cases where, for example, a plugin had decided to add a hook which modifies markup to add an attribute, or the user manually edits the markup to add an attribute.

Is this something you see as being within the scope of this issue? If so, where would it fit?

mtias commented 4 years ago

In retrospect, it seems clear that there should be a shortcut here for identical markup.

Agreed.

In general, I don't know how this "equivalence" logic is considered as part of this proposal, or if it's viewed as interchangeable with the "basic match" of Stage 0

Good point. It assumes we can do equivalency as isEqual || isEquivalent considering the first one as the fastest and the second as more expensive throughout the stages as aplicable.

It's not clear to me if we'd want to introduce such invalidations based on difference in the comment syntax, or if it's something we should continue to ignore.

Stage 3 is worthwhile to me (the example is a good illustration) in that it can possibly relax some of the requirements for duality in attributes that have expression in the markup. Perhaps worth clarifying that I don't envision it as working with the markup of the comment but with attribute consistency for non-sourced attributes. There are blocks where the inner HTML (or most of it) is just a one way reconstruction from comment attributes and it's unnecessary to attempt to normalize HTML when it can be left to be rebuilt.

Is this something you see as being within the scope of this issue? If so, where would it fit?

I think it's orthogonal to this issue in that I include it conceptually in "normalization" (being flexible about certain specific attributes like classes) but I worry about adding too many ad hoc exclusions without a solid mapping of the full flow, since it's going to be a bit more fragile to handle by definition.

aduth commented 4 years ago

In retrospect, it seems clear that there should be a shortcut here for identical markup.

Agreed.

See #22506

mtias commented 4 years ago

One immediate thought that sticks out is that we might want to reverse the numbering or change the names.

@dmsnell Yes, for sure, or just flip 0: Invalid Block and 5: Valid Block determinations. For presentation it seemed better to show the increasing progression, though.

1.5. Extra attributes

This was meant to be covered in the example in 2, where there is an extra attribute that disappears after reconciliation. Do you think it wouldn't be enough?

2.5. Extra styles

@aduth hints at it in his response but I consider attributes we are fine with preserving even if they don't originate from the edit / save functions of a block as an internal part of our equality, normalization, and serialization steps.

In most of these cases I feel like the bigger value is in handling this as an editor framework rather than in exposing it to the blocks

Indeed. I'd say that for this to approach any sense of robustness it should not be exposed as a public API. A block author should interface with more expressive APIs (deprecations, transforms, save, etc) rather than the internal heuristics.

Also for extra clarity I'm not anticipating any user facing decisions yet. I'd like to establish the system first and see what consequences it produces before burdening users with making a decision.

dmsnell commented 4 years ago

For presentation it seemed better to show the increasing progression, though.

Yeah I saw the tension, but practically in the code I think that having something named in the positive meaning it's okay but having a falsely value would be confusing at best. Both cases it's a presentation problem.

covered in the example in 2, where there is an extra attribute that disappears after reconciliation. Do you think it wouldn't be enough?

@aduth fleshed out my point better for me. these aren't "invalid" attributes - they're simply attributes the block is unaware of but which some other plugin might be operating on. it would be very convenient if we add these and have the editor keep them around - "no harm, no foul"

one situation that came to my mind was adding something like loadClientScript which could allow blocks to specify client-side JS they need to load so that we can build JS-only blocks. I recognize the problems inherent in this example but the same idea could apply to styles also. suppose I want to make a plugin operating over multiple blocks and block types - if you add additionalStyles to the attributes I'll mark them on page render.

the difference between this and extra CSS styles is that these would be consumed by some filtering function in the PHP/backend whereas styles are just that - CSS

SergioEstevao commented 4 years ago

1.5. Extra attributes and 2.5. Extra styles

In GB-mobile perspective these are something that we really need because the mobile version can be behind in terms of versions of blocks, or we can be connected to a server that installed extra plugins that change core blocks ( for example Coblocks) the ability to be able to parse and accept those blocks will be a great addition to the mobile apps.

nerrad commented 4 years ago

I love the direction this is taking in terms of providing a clear communication mechanism for varying levels of success/fail in the block validation process. The levels make sense to me and on the surface it looks like it offers some logical presentation points for the author to take. Presumably the ui/ux presented to authors could also generate varying degrees of action needed depending on the resulting level (Example a "fyi this happened and you can adjust if you want: click link" vs "This block needs your action before it can be rendered").

I agree with @dmsnell 's feedback here about the numbering, although that likely can be mitigated somewhat by using "constant" names for the levels instead.

The only other thing I have to add to the discussion here is whether there's any value in considering if plugins can inject their own invalidation handling if needed? If this was considered, we'd probably want to limit it to when a block reaches level five, there is mechanism for plugins to hook in and offer their own validation handling at that point. An advantage of this mechanism is it could provide an escape hatch for various edgecases plugins might have where the existing validation logic simply doesn't work. Presumably, if in the future GB was to enhance validation further (additional levels, or more intelligent logic), it also would provide a way for plugins to roll out supporting that logic earlier in the case where they still need to support earlier WP versions without it.

mcsf commented 3 years ago

I'm adding to this issue the problem posed by block patterns when the blocks they contain conflict with the runtime (either the running Gutenberg is outdated, or the block pattern is): https://github.com/WordPress/wordpress-develop/pull/1483#issuecomment-877982616. This might just be solved with pattern versioning, but it may inform our RFC.

desrosj commented 3 years ago

Just wanted to follow up on WordPress/wordpress-development#1483. I've elected not to include the pattern updates in the bundled theme release accompanying WordPress 5.8. This will leave console.info() notices, but it will prevent block validation problems in a number of theme and WP version combinations. See my comment on the ticket for full details.

Having a way to version patterns, an API that generates the appropriate block markup for a version of WordPress based on a set of data, or validation that's more "safe" with old block syntax would solve this issue. But as it is now it's really difficult to update patterns in Core while continuing to support multiple versions of WordPress.