gohugoio / hugo

The world’s fastest framework for building websites.
https://gohugo.io
Apache License 2.0
73.63k stars 7.39k forks source link

URL in title breaks ToC #7158

Open thomwiggers opened 4 years ago

thomwiggers commented 4 years ago

Steps to reproduce the behavior:

# some nice stuff ([url][])

[url]: https://google.com

in a docs page with toc=true.

What version of Hugo are you using (hugo version)?

$ hugo version
Hugo Static Site Generator v0.68.3-157669A0/extended linux/amd64 BuildDate: 2020-03-24T12:13:38Z

Result:

image

(Page source at https://github.com/thomwiggers/thomwiggers.nl/blob/new-site/content/teaching/hacking-in-c-2020/syllabus.md)

Does this issue reproduce with the latest release?

yes

(Originally reported at https://github.com/gcushen/hugo-academic/issues/1637)

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open. If this is a feature request, and you feel that it is still relevant and valuable, please tell us why. This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

thomwiggers commented 3 years ago

Still happens

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open. If this is a feature request, and you feel that it is still relevant and valuable, please tell us why. This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

thomwiggers commented 3 years ago

Still happens

davidsneighbour commented 3 years ago

Of course it still happens. It's because the concept of a "table of contents" does not support the concept of "links in headlines" because each item in a ToC is a link to a section and you can't have links within links. That is happening in other applications too.

The silence in response to your issue is because it's impossible to add a system, that changes this behaviour. HTML could be stripped from the header, but then again formatting like bold or italic would disappear too. Move the link out of your headline and please stop removing the stale label.

thomwiggers commented 3 years ago

For starters, that's quite a hostile and toxic reply.

Markdown supports links in headings just fine. That should not justify the ToC HTML breaking. I'd be fine with URLs being stripped from the ToC text and there's no reason why that should not work or be incompatible with other HTML in the headers.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open. If this is a feature request, and you feel that it is still relevant and valuable, please tell us why. This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

thomwiggers commented 2 years ago

image

Still happens

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open. If this is a feature request, and you feel that it is still relevant and valuable, please tell us why. This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

thomwiggers commented 1 year ago

still happens

thomwiggers commented 1 year ago

I dug a bit in the Hugo source code, and the problem seems to be

https://github.com/gohugoio/hugo/blob/0f01bd46374b13cdc5d7925c913bba777a58bb5b/markup/goldmark/toc.go#L56

Which just takes the arbitrary HTML content of the title, but probably should sanitize it first.

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open. If this is a feature request, and you feel that it is still relevant and valuable, please tell us why. This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

thomwiggers commented 3 months ago

Still happens.

jmooring commented 2 months ago
git clone --single-branch -b hugo-github-issue-7158 https://github.com/jmooring/hugo-testing hugo-github-issue-7158
cd hugo-github-issue-7158
hugo server
thomwiggers commented 2 months ago

For reference, @jmooring's POC shows the following:

image

jmooring commented 2 months ago

@thomwiggers This is not a cheap problem to solve. We'd need to parse the heading and walk the HTML nodes, using something like net/html, and remove all anchor elements but keep their inner HTML. Stripping all HTML tags is obviously not an option (e.g., markdown might be ## My _emphasized_ heading).

Having said that, GitHub's automatic TOC for README files handles this case just fine. GitHub's automatic TOC for README file simplifies the problem by stripping all HTML tags, which is not an acceptable approach.

thomwiggers commented 2 months ago

Yeah, that makes sense, especially as doing it on the presumably-already-parsed Markdown level will probably not be sufficient (e.g. ## <a href="stuff">test</a>)