alphagov / tech-docs-gem

Gem to distribute the tech docs project
https://tdt-documentation.london.cloudapps.digital/
MIT License
15 stars 38 forks source link

Incorrect links generated in multi-page TOC #150

Open alext opened 5 years ago

alext commented 5 years ago

On PaaS, we're currently seeing an issue where the links in the generated ToC contain incorrect anchors.

For example if you go to the PaaS docs, and inspect the links in the "Deploy a backing or routing service" -> "Mysql" section you'll see that they have a spurious 'mysql' prefixed to them. (For context, all the subsections under "Deploy a backing or routing service" use a common set of subheadings, but are on different pages, so they don't actually conflict). If you then navigate into the Mysql section, its links are now correct, but the Postgresql section now has incorrect links etc...

I've done some digging, and I think the problem lies with the combination of the render call here, and the UniqueIdentifierGenerator singleton. This singleton only gets reset on each middleman app invocation, and therefore when these renderings happen, it already contains all the anchors from everything that has been rendered so far for this page, which leads to the conflicts arising when they're not actually conflicts. There looks to have been an attempt to fix something like this in c15333e, but that would only seem to work for the current page being rendered, not any other pages.

When testing locally, I inserted a GovukTechDocs::UniqueIdentifierGenerator.instance.reset call before that render, and that seemed to prevent the problem. I haven't been able to understand the sequence of things enough to know whether that would have other unintended consequences. It also feels like quite a layering violation.

AP-Hunt commented 5 years ago

tldr @JonathanHallam and I took a look at this for an hour or so on 2019-04-12 and came to the conclusion that, without much more time invested, this is probably a can't/won't fix.

We discovered that templates can use HTML headings with id's as a workaround for this problem, because they aren't interpreted in any way.

# Heading 1

vs

<h1 id="heading-1">Heading 1</h1>

A bit more detail

In lib/govuk_tech_docs/tech_docs_html_renderer.rb, the function header receives the text of a heading and its level (ie h{1-6}). It then calls UniqueIdentifierGenerator.instance.create to generate a unique id for the heading, and combines all 3 things in to an HTML heading

def header(text, level)
    anchor = UniqueIdentifierGenerator.instance.create(text, level)
    %(<h#{level} id="#{anchor}">#{text}</h#{level}>)
end

UniqueIdentifierGenerator has usually-valid behaviour for generating unique identifiers on page, allowing for headings to be identical.

unless unique?(anchor)
    anchor = prefixed_by_parent(anchor, level)
end

In a single page context this behaviour is correct, but when generating a table of contents that spans multiple pages, this results in the behaviour seen in this report.

Unfortunately, the methods described have insufficient information to determine if a given heading is unique in a multiple page context or not. Thus, we think it isn't fixable.