gohugoio / hugo

The world’s fastest framework for building websites.
https://gohugo.io
Apache License 2.0
76.08k stars 7.54k forks source link

Auto-generated header ids can contain invalid CSS selectors #8997

Open MarkvanMents opened 3 years ago

MarkvanMents commented 3 years ago

What version of Hugo are you using (hugo version)?

$ 88.1

Does this issue reproduce with the latest release?

Yes - this is the latest version

If I have a markdown header which begins with a number Hugo generates an id beginning with a number. For example ## 1 Test generates <h2 id="1-test">1 Test…</h2> image

According to w3.org (https://www.w3.org/TR/CSS21/syndata.html#characters) In CSS, identifiers (including element names, classes, and IDs in selectors) … cannot start with a digit, two hyphens, or a hyphen followed by a digit.

This causes the error Uncaught DOMException: Failed to execute 'querySelector' on 'Document': '#1-test' is not a valid selector. if I try to do a document.querySelector() on this selector. In tests, this seems to be caused by the leading digit.

image

The querySelector code is used, for example, in a recent Pull Request on the Docsy theme (https://github.com/google/docsy/pull/506)

This format of anchor is produced with both Blackfriday and Goldmark as the markup generator and is independent of the setting of autoHeadingIDType.

jmooring commented 3 years ago

Workarounds...

1) Use an attribute (either format):

## 1 Test {#_1-test}

## 2 Test {id="_2-test"}

2) Use a render hook:

layouts/_default/_markup/render-heading.html

{{- $anchor := replaceRE `(^\d)` "_$1" .Anchor -}}
<h{{ .Level }} id="{{ $anchor }}">{{ .Text | safeHTML }}</h{{ .Level }}>
{{- /* Remove trailing newlines */ -}}

Thoughts...

Hugo is generating valid HTML, so I'm not convinced that this is a bug.

MarkvanMents commented 3 years ago

Hi, Thanks for your response. Thanks also for the workarounds, the first is impractical on my site, but I will investigate the second.

Your response has triggered me to do a bit more research. My thoughts are as follows:

So agreed: it is not a bug. Can I make it a feature request?

jmooring commented 3 years ago

@MarkvanMents This would be a breaking change (e.g., existing links to example.org/foo#1-test would be broken). Given that (a) this is the first time the issue has been raised, and (b) there is an easy workaround, I am inclined to leave this alone.

Thoughts?

MarkvanMents commented 3 years ago

Hi Joe, I tried using the render hook. That changes the anchor but it doesn't help as the Table of Contents is already built and so the links there no longer work. I couldn't even get a render-link to do the replacement (replaceRE seemed to do nothing in my render-link.html, I was probably doing something wrong), but I don't think it would help with the Table of Contents anyway. Therefore, I wouldn't class this as an "easy workaround", nor is manually inserting anchors for every heading on 1000 pages.

My thoughts:

  1. It still seems strange to me that an effort is made to sanitize the anchors and it does everything you would expect for HTML 4 except remove the leading number.
  2. You could make any change backwardly compatible by having a switch in the config file. superSafeAnchors with a default of false.
  3. I'm the only one who seems to have this issue, so it isn't worth doing just for me. I'll look to see if there is a viable alternative to using the querySelector technique I was trying to apply from the Docsy PR I mentioned above. Perhaps if a lot of people wanted it, this could be resurrected.

Thanks very much for trying to help me solve my particular issue. In general, Hugo is a great product and builds our site incredibly quickly. Mark

jmooring commented 3 years ago

@MarkvanMents

I apologize; I forgot about the TOC. I raised https://github.com/gohugoio/hugo/issues/8383 about 6 months ago.

@bep

Despite the breaking change, I am in favor of changing the behavior when autoHeadingIDType is either github or github-ascii. I propose inserting an underscore before a leading digit (e.g., 1-test would become _1-test).

The breakage is, obviously, limited to headings that begin with a digit, and will manifest in:

Alternatives:

  1. Do nothing
  2. Address #8383
  3. Limit change to autoHeadingIDType github-ascii
  4. Create another autoHeadingIDType
istr commented 3 years ago

@jmooring

Hugo is generating valid HTML, so I'm not convinced that this is a bug.

I would tend to argue that it does not make much sense to auto-generate element IDs that are not "addressable" using CSS selectors; HTML5 that is formally valid but does not interoperate properly with CSS could still considered to be broken (my 2 cents, ymmv), so as a consequence the current behavior could be regarded "bug, not feature".

I would opt for 4. because that could be used to introduce a "sane/safe" autoHeadingID mechanism that could then replace the current (slightly broken) default at some point in the future.

@MarkvanMents it might still be useful to notice, that in context of scripting selecting the element by means of a CSS selector is not the greatest idea, if you know beforehand that you will select that element by ID. Unlike document.querySelector('#1-two-three-four'); the more "natural choice" document.getElementById('1-two-three-four'); works and I bet that it (still, with modern DOM engines) will outperform querySelector by far with IDs that are valid in the CSS context.

jmooring commented 3 years ago

@jgielstra

introduce a "sane/safe" autoHeadingID mechanism that could then replace the current (slightly broken) default at some point in the future

If we're going to break some eggs, I vote to break them now (< v1.0). But this isn't going anywhere without @bep's guidance.

MarkvanMents commented 3 years ago

Hi @istr and @jmooring

Thanks for this discussion. If I had any say, I would probably also go for option 4 as github doesn't remove leading numbers when it autogenerates anchors, so the existing name would be misleading. It also makes the change non-breaking.

Thanks also for the pointer to .getElementById. I am not a JavaScript developer and found the code I was trying to implement when basing it on someone else's code. Always a risky thing to do, but there seemed to be several people involved in writing it, so it seemed safe! Luckily, I'm a better tester than JavaScript developer. I'll try using your suggestion instead.

Thank you both for taking the time to help me, and sort out this strange behaviour.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open. If this is a feature request, and you feel that it is still relevant and valuable, please tell us why. This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

bep commented 2 years ago

Sorry for coming late to the party.

I would say starting title with a number should be rare (I guess you're doing that to order them ... but isn't that what CSS is for?), which also means that fixing this shouldn't create too much protests (cross fingers!).