WordPress / gutenberg

The Block Editor project for WordPress and beyond. Plugin is available from the official repository.
https://wordpress.org/gutenberg/
Other
10.46k stars 4.18k forks source link

Planning for Table of Content Block Functionality and Heading IDs #22874

Open itsjusteileen opened 4 years ago

itsjusteileen commented 4 years ago

This issue sets up discussion started during a Core Editor chat for the functionality of a Table of Contents (TOC) Block. Currently, there are several PRs/Issues that provide possible solutions.

Add Table of Contents block (dynamic rendering + hooks version) PR https://github.com/WordPress/gutenberg/pull/21234 "Table of Contents" Block PR https://github.com/WordPress/gutenberg/issues/11047 https://github.com/WordPress/gutenberg/pull/15426 (Closed) PR https://github.com/WordPress/gutenberg/pull/15426

From a technical point, when working with a TOC block, how are items that aren't in blocks like headings and next page tags counted, and how is it determined if those items precede the current block? Counting Heading blocks is relatively easy, but counting all headings in the HTML is more difficult, and counting all headings in the HTML preceding the current block seems impossible in some situations. This challenge is compounded when considering if the headings are in a dynamic block.

Resolving these questions impacts:

Specific challenges that need feedback are:

Possible solutions include:

@ZebulanStanphill @mtias @youknowriad @MichaelArestad contributed to the original conversation. Additional feedback here is welcome.

ZebulanStanphill commented 4 years ago

If we create some kind of document outline API, we should probably include page break (<!--nextpage-->) data in it, so you can easily determine what page a block would appear on. That's one of the challenges I've run into with the Table of Contents block PR.

mahnunchik commented 4 years ago

I'm looking forward to have it live.

mcsf commented 4 years ago

Should the TOC block only support Heading Blocks?

I strongly recommend starting with just Heading blocks. This greatly simplifies things both product and implementation, and removes hurdles to getting started. The question of whether there is an opportunity for supporting other blocks (perhaps via an API at the level of the block type or of the block proper) or for supporting HTML-level indexing of heading tags (in my opinion, something to avoid) can then be explored separately and on top of a finished base.

mcsf commented 4 years ago

Something like allowing a block to declare one of its attributes as contributing to the outline of a document and abstracting away outline/table of contents in a getOutline rather than just getting core/heading.

There are many parallels with the optional HTML anchor feature in core blocks. Recently, #23197 extended this feature to all static core blocks, and it's notable how everything hinges on block types adhering to the feature with a simple supports declaration. One can imagine something similar with ToC:

"supports": {
  "tableOfContents": true
}

Any block type declaring the above would be picked up by a ToC hook. This could then mean that such blocks automatically sport a control to include it in the ToC, or could mean a more subtle experience (e.g. adding an HTML anchor to a block that has tableOfContents: true automatically adds the block to the ToC).

ZebulanStanphill commented 4 years ago

@mcsf There's a bit of a problem with "just supporting Heading blocks" in the case of the Table of Contents block. That's easy to do in the editor, but on the front-end, it's a lot more difficult because the JS APIs are not available there. There's no awareness of blocks in the PHP file dynamically rendering the front-end output. So the front-end implementation ends up having to parse HTML, which results in inconsistency between it and the editor implementation.

The Table of Contents block also needs to support paginated posts properly, and this also currently has to be done two different ways depending on if you're in the editor or the front-end.

Right now, the Table of Contents block works perfectly on the front-end, but relies entirely on HTML parsing (which definitely isn't a performant way to handle it). I can't even change the PHP implementation to only work with core Heading blocks, because there's no concept of blocks anymore at that point. The only way to get the necessary data would be through something kinda like the block context system, and no such API relating to headings and page breaks currently exists.

So as far as I can tell, it's not possible to provide a shippable Table of Contents block right now. There is no clean, simple solution, because what the block tries to do requires data that is currently only available by creating temporary clones of the post in memory to parse and scan for specific HTML tags and comment strings.

As far as I can tell, the Table of Contents block needs a table of contents API.

Specifically, here's what the Table of Contents block needs to know in both the editor and the front-end:

To provide this data, Heading blocks will likely have to provide this data to the API:

Page Break blocks will likely have to tell the API that they mark the start of a new page, and therefore all blocks following them should be considered to be on page 2 (or 3, and so on).

All of the data requirements I have just listed are absolutely necessary to make the Table of Contents block work. If any one of these is not provided by some sort of API, then the block has to resort to messy HTML parsing.

(Remember, you can't just provide a list of Heading block clientIds to the API, because the blocks no longer exist at the dynamic rendering stage, so you can't just pull their data during PHP rendering.)

mcsf commented 4 years ago

That's easy to do in the editor, but on the front-end, it's a lot more difficult because the JS APIs are not available there. There's no awareness of blocks in the PHP file dynamically rendering the front-end output. So the front-end implementation ends up having to parse HTML, which results in inconsistency between it and the editor implementation.

I don't follow; why is the ToC back end not consuming the output of the PHP block parser? Even if the server can't parse as fully as the block editor (stage I is block demarcation and explicit attribute parsing; stage II is full attribute sourcing, validation, migration, and is JS-only), there should be enough to get us started, and it will be much faster and safer than ad-hoc parsing of HTML.

Things like pagination support are not necessarily trivial, but would fall into place as soon as we can use the proper parser on the server to clearly identify — always relying on blocks, not HTML — what is a heading, what is a page boundary, and what else is heading-like.


What page will I be on in the front-end? (Necessary to support only showing headings from the current page.)

This might be something that the (environment-agnostic) block context API nicely solves.

ZebulanStanphill commented 4 years ago

Hmm... I'd forgotten about the PHP block parser. Thanks for reminding me. You're right that I could use that on the PHP implementation. I'm currently not using it because my current implementation is still trying to support 3rd party heading blocks. If I switch to sourcing the data from block attributes, I have to drop support for all headings outside of the core Heading block.

It's also worth noting that even headings in our own Custom HTML block will be ignored by a Table of Contents implementation that only checks Heading block attributes. My thinking was that if we had a table of contents API, we could at least update the Custom HTML block to provide data to the API so they would work as expected.

Would a Table of Contents block that only supports core Heading and Next Page blocks be acceptable? It feels kind of wrong to ship it without 3rd party block support. But if desired, I can update my PR to work that way.

Still, though, it seems less than ideal to parse the whole post for block data whenever it encounters a Table of Contents block.

ZebulanStanphill commented 4 years ago

Also, I'm not certain that post pagination info can be provided through the block context API. If a whole post is considered a single source of data, how can it provide different answers to "what page am I on?"... it seems like you'd have to use "Page" blocks to divide up the post, rather than marker points like the current Next Page block. But maybe the block context API is more powerful than I think?

ZebulanStanphill commented 3 years ago

Having thought about this for a while, it's clear to me now that block context can't solve this. Block context provides data from a parent to its children, but in the case of page breaks, there's no parent to provide this info.

If we were to redesign WordPress from scratch, paginated posts could have been implemented via a "Page" block that would contain all the content that goes on that page. However, that's not how things are. Page breaks are determined at the seam between one and the other via the <!--nextpage--> comment tag. Even if the Page Break block provided block context, it wouldn't be able to provide it to anything. Block context is parent-to-child, not sibling-to-sibling.

I don't want to prematurely abandon a potential path forward, though... so here's a question: would it be feasible to deprecate the <!--nextpage--> tag and recommend that users use the aforementioned hypothetical "Page" block? That would allow block context to be used to easily solve the pagination issue. However, I fear it might be considered too incompatible with existing posts. But then again, if you're intending to add a Table of Contents block, you're already actively editing existing content, so maybe the cost isn't as big as it seems?

This still doesn't solve the headings issue, however. As far as I can tell, we have to support 3rd-party heading blocks. Even within core, the Heading block isn't the only reasonable place to put an <h1>-<h6> element. It's just as valid to put one in a Custom HTML block, isn't it? And there are already other blocks in core like Site Title that have to use heading elements. For these reasons, I still think a document outline API is required to solve the issue with headings.

mcsf commented 3 years ago

I think we have to accept that trade-offs will be made, and make a choice we can be happy with. Otherwise, this feature will crumble under the weight of its requirements.

My own opinion is that we should optimise for:

  1. Picking up Heading blocks
  2. Handling pagination

and that this can come at expense of:

  1. Prematurely devising APIs for heading-like blocks/tokens
  2. Supporting all PHP-generated scenarios

The choices above are in order of preference. So I think it's better to ditch premature APIs than to ditch support for dynamic content. This makes it easier to let the editor itself generate a static ToC, but I think we can still leverage existing hooks in the WP back end and make sure the ToC is present at the top of each page. For example:

$pages = apply_filters( 'content_pagination', $pages, $post );

— in class-wp-query.php

ZebulanStanphill commented 3 years ago

Just to be clear, do you think we should support 3rd-party heading blocks or not? There are already many plugins that add some variation of an "advanced heading" block, including:

And this isn't taking into account any other blocks that use headings like accordion blocks.

This makes it easier to let the editor itself generate a static ToC, but I think we can still leverage existing hooks in the WP back end and make sure the ToC is present at the top of each page.

I don't think I understand what you're trying to say here? My Table of Contents block can be placed anywhere from the start of the page to the very end, and there can be multiple instances of it. (This is useful for allowing each page of a paginated post to have its own table of contents.)

It's also worth pointing out that the reason my Table of Contents block is dynamic is that that altering the static output every time a heading changed resulted in two undo steps being created rather than just one. Pressing undo just once would change the table of contents, but not the heading. So unless someone can come up with an alternative solution there, the Table of Contents block has to be completely dynamic.

I do agree it would be best to try and solve this problem without introducing new APIs if possible. To that end, I've tried my best to complete the Table of Contents block in #21234, and at the moment the implementation certainly works in all likely situations, but I am concerned about the performance of the block, and there are a few edge cases that I can't handle without adding even more performance overhead. If you have any suggestions on how to proceed there, let me know.

mcsf commented 3 years ago

It's also worth pointing out that the reason my Table of Contents block is dynamic is that that altering the static output every time a heading changed resulted in two undo steps being created rather than just one. Pressing undo just once would change the table of contents, but not the heading. So unless someone can come up with an alternative solution there, the Table of Contents block has to be completely dynamic.

I think it's fine to keep it dynamic, as long as the block in the editor still accurately represents the final output. That said, just to touch on the undo question — in case you aren't familiar with it yet — __unstableMarkNextChangeAsNotPersistent may be the answer.

I do agree it would be best to try and solve this problem without introducing new APIs if possible. To that end, I've tried my best to complete the Table of Contents block in #21234, and at the moment the implementation certainly works in all likely situations, but I am concerned about the performance of the block, and there are a few edge cases that I can't handle without adding even more performance overhead

Thanks for the work you're doing there. I've been meaning to review, and I think it's a great feature, but haven't found enough time yet.

Just to be clear, do you think we should support 3rd-party heading blocks or not? There are already many plugins that add some variation of an "advanced heading" block, including:

In the long run, the editor should understand that, beyond core/heading, certain blocks act as headings, thus automatically allowing a Table of Contents block to pick them out from the content. But I don't think it's something that needs to be solved before we can implement a ToC block, and my point was that the current discussions around ToC should focus on the most correct implementation we can design without compromising for third-party blocks.

Other efforts out there, such as semantic template parts (#27337), deal with a similar ontological problem. Even if the domain is very different — templates and template parts — it's something to keep an eye on and learn from.

As always, the duty and luxury with Gutenberg is that we're building for the long run. So we can afford to take time to get some of these things right. I mean, just look at how many times we've visited footnotes (#1890) over nearly four years! So, to distill my original message: let's start by building a good ToC block in that it works well, feels right, and treats user data well. Only then should we worry about widening the reach of that feature.