Allow support for multiple output formats.

herregroen commented 6 years ago

The current scenario

Currently Gutenberg only outputs as HTML. This makes sense as traditionally that's been the output format of WordPress. While different output formats already exist, the RSS feed or the WP REST API for example, these still output the actual content of posts as either HTML or text ( by stripping HTML tags ). As this was the only possible output format of TinyMCE.

Gutenberg however provides a powerful new opportunity to allow different output formats as blocks offer a vastly richer and more structured approach to content editing.

What could be

Let's say we've created an awesome new map block. It allows a user to add a map to their post and mark a location on this map. This could be the location of their business or the location of an event they're blogging about for example.

Currently this would only be output as HTML, for example as an embed from Google Maps. If you wanted to, for example, query this post through the REST API you'd have to write a fair bit of code to ensure our map data ( latitude & longitude for example ) is also saved as post meta, exposed to the REST API, deleted when we remove our map and updated when we change it.

But what if our block could somehow say that, besides HTML output, it could also support JSON output. For example, by defining a toJSON method that could return an object containing longitude, latitude, an address and more. And this generated JSON was output whenever a JSON representation of the post was requested. By adding a single method to our block it'd become increasingly easy to enrich the output of the REST API and make it that much more usable and powerful.

Our map component could, for example, also implement a toSchema method and output JSON-LD compatible with https://schema.org/location, making it easier for search engines and other crawlers to understand what's on the page.

Even better than that. Let's say we want to make sure our map block properly supports AMP. In AMP we wouldn't want to output an <iframe> tag, instead we'd want to output a <amp-iframe> tag. Perhaps in our normal HTML output we're adding additional controls and JavaScript to the map that wouldn't work in AMP. What if our block could implement a toAMP method that returned a representation of itself that was optimised for AMP pages while still allowing the same HTML output using all the options available of that format, as opposed to having to keep our HTML output as simple as possible so a search & replace on <iframe> to <amp-iframe> would suffice.

I believe this could be a better way of dealing with this particular problem as opposed to the current approach taken in Automattic/amp-wp#1230 or Automattic/amp-wp#1039 where a post has to be either AMP or not AMP and use AMP specific blocks. If all of the blocks used in it support AMP then why not both? And why AMP specific blocks? Why not normal blocks capable of outputting AMP HTML when required. This wasn't possible with the classic editor, but it certainly could be with Gutenberg. This would make it much easier

I'll go one final step further, although this would certainly be more complex to actually fully implement.

What if our map block could implement a toReactNative method, together with an editInReactNative method. These could, for example, use special components from a new wp.native namespace that have empty mock implementations in the browser but could use native implementations in the mobile app. Allowing a block to define it's own native interface both when editing as well as when outputting, even opening the door to apps powered by WordPress. Unlike the other formats you wouldn't want to save this but instead parse the blocks as you would when initialising the editor and then output them using these methods.

Implementation

The save method should remain the default output method for creating HTML. That's required for every block, I don't think there's anything wrong with Gutenberg being a HTML-first editor.

Beyond that, just like it's possible for plugins to register additional blocks a potential implementation could be to give the option to plugins to register additional output formats.

I think usage of this feature should look something like this:

Register a new output format: registerOutputFormat( 'json', 'toJSON', rootJSONBuilder );. The first argument being a unique name for the format, the second the function on a block that outputs this format and the third a function that constructs the root object from an array of serialized children.
Each block could implement a toJSON() method. Blocks that make use of InnerBlocks could instead implement toJSON( serializedInnerBlocks ). The third argument to registerOutputFormat would function pretty much the same as this latter toJSON function.
By converting the blocks at the bottom of the tree first and passing these to their parent blocks, possibly in a more complex object corresponding to the layouts passed to the InnerBlocks each block can determine for itself how to nest other blocks without having to serialize them. The rootBuilder then combines all blocks without parent into a single final representation of the post.
If a block doesn't implement the method or it returns null it's treated exactly the same as if the save were to return null, this means it's not output at all. For inner blocks of a block that doesn't support a format a decision would need to be made to pass these along the chain so that they would still be output or to discard that output. I think the former would be the preferable option here. It would also be possible to look into the option of registering a callback in PHP:
```
register_block_type( 'my-plugin/latest-post', array(
'render_callback' => 'my_plugin_render_block_latest_post',
'json_callback'   => 'my_plugin_render_json_latest_post',
) );
```
The output of every non-HTML format would automatically be saved as post meta to be easily accessible and ready to be output when required. Optionally support could be added to automatically convert JSON compatible output to PHP arrays or StdClass objects to much more easily allow these to be read server-side without having to parse anything.

westonruter commented 6 years ago

I believe this could be a better way of dealing with this particular problem as opposed to the current approach taken in [the AMP plugin] where a post has to be either AMP or not AMP and use AMP specific blocks. If all of the blocks used in it support AMP then why not both? And why AMP specific blocks? Why not normal blocks capable of outputting AMP HTML when required.

The AMP plugin has some AMP-specific blocks because they use AMP components at their core. For example, the MathML block uses <amp-mathml>: this is something that comes with AMP but it is not available in vanilla HTML. So when a site is in native AMP mode (where AMP is used on the canonical URLs and there is no non-AMP version available) then the plugin currently makes such AMP-specific blocks available in the inserter. There is no non-AMP implementation for these blocks currently, and the MathML block is static with writing <amp-mathml> straight into the post_content.

If a non-AMP implementation were to be provided for a block, then this could be implemented using a dynamic block. The render_callback would merely need to check is_amp_endpoint() and return the AMP markup as opposed to the HTML markup. This is essentially what the plugin is doing for the Categories block.

In general, however, the AMP plugin is designed to take regular HTML as input and do the necessary conversions to AMP (e.g. iframe to amp-iframe), so HTML can be stored as canonical. We are storing some additional data- attribute decorators for core blocks to provide processing instructions for the post-processor to customize the AMP conversion.

If AMP is stored as the canonical static content, however, then the plugin limits the availability so as to not have to reverse engineer the AMP component back to HTML (with jQuery, for example). I worry about adding multiple output formats for a given block because of the additional overhead of maintaining separate versions. We already have this issue today for dynamic blocks with the JS rendering in the editor vs the PHP rendering on the frontend. If we add separate renderings for AMP and JSON and others then this will get further compounded.

I would like to see, however, there be a content.blocks property exposed in the REST API alongside content.raw and content.rendered which contains as parsed block data. This wouldn't require any additional render callback writing.

Maybe I'm not seeing additional use cases for this. But these are my initial thoughts in relation to AMP.

adamsilverstein commented 5 years ago

MathML related: I created a MathML block for GB:

https://github.com/adamsilverstein/mathml-block

This seems like plugin territory - not something for core; mentioning it here because of the MathML/AMP discussion above. If we did add a core block the amp-wp plugin could readily convert this to an amp-mathml component for the AMP context.

westonruter commented 5 years ago

@adamsilverstein I opened https://github.com/adamsilverstein/mathml-block/issues/5 for how your block could directly support AMP on its own as well (though the AMP plugin includes a MathML block of its own, via your PR https://github.com/ampproject/amp-wp/pull/943 and https://github.com/ampproject/amp-wp/pull/1165). This "conditionally-dynamic" block idea comes from a conversation with @youknowriad at WCUS.

paaljoachim commented 3 years ago

Can we get an update?

WordPress / gutenberg