Heading levels in Markdown table of contents

Parent5446 commented 8 years ago

So I'm not sure if this is at all possible, or what the best way would be to do this, but we're having some issues using the .TableOfContents variable in templates.

There are a couple of points:

Semantically, HTML should only have one <h1> tag per section root.
It would be nice to have the top-level, i.e., <body>-level, <h1> tag generated in the layout template using the .Title attribute of the page. (It works better semantically, rather than having the title in two places.)
When rendering Markdown, the table of contents in .TableOfContents, sensibly, only renders navigation for the headers in the actual content.
Furthermore, the .TableOfContents always treats <h1> as top-level, even if there are no <h1>-level headers.

Because of this, if you comply with (1) and implement (2), and thus only have <h2> or lower headers in your content, the generated table of contents contains an empty top-level <nav> as a result of (3) and (4).

Example table of contents:

<nav id="TableOfContents">
    <ul>
        <li>
            <ul>
                <li><a href="#introduction:e95c9b0e3cf17661856239295171d427">Introduction</a></li>
                <li><a href="#at-a-glance:e95c9b0e3cf17661856239295171d427">At a Glance</a></li>
            </ul>
        </li>
    </ul>
</nav>

This messes with the page semantically since now the navigation has an empty top-level. The way I see it there are two ways to fix this:

Somehow get the header tags in layout templates into the table of contents, so the top-level is not blank.
Get the renderer to remove empty levels, e.g., by treating <h2> as top-level headers if there is no <h1> in the content.

I'm not sure if there is currently an undocumented workaround to implement either of these solutions. But if there isn't, would there be a way to allow for using either of the two solutions to achieve a more semantic table of contents?

bep commented 8 years ago

I have not read this in detail, but we get the ToC from https://github.com/russross/blackfriday -- so maybe it is better to discuss it there.

Dr-Terrible commented 8 years ago

This messes with the page semantically since now the navigation has an empty top-level. The way I see it there are two ways to fix this: [...]

I am affected by this odd behaviour too. The way .ToC is rendered by Hugo/Blackfriday makes the entire concept of ToC pretty much useless. In the worst case scenario, my ToC is completely messed up and doesn't reflect any more the original header structure in my markdown files. Some times, the ToC's headers are so messed up they aren't correctly rendered by the browser.

The issue here is that blackfriday spits out a bunch of hard coded HTML tags, and then Hugo wrap them in a way that is neither valid HTML code.

A solution is quite simple: don't give users a preformatted .ToC, just give them an indexed array and then let them generate the desired HTML structure by iterating over the array elements.

moorereason commented 8 years ago

@Parent5446 and @Dr-Terrible, It would be great if one of you would create a blackfriday issue for this.

bep commented 8 years ago

The bottom line of this is:

The ToC should not be HTML, it should be a datastructure that people can do with as they please.

ErjanGavalji commented 8 years ago

Okay, for some reason, even though toc_levels explicitly specified to 1..6 both in page and in config file, kramdown does not generate ids.

I'm putting this on hold for the time being for I'd prefer to test it with a newer jekyll version (and dependencies), which would take some more time.

I found that the level4 subsections are not that large, so we could get without links there. What do you think, guys?

Cheers, Erjan

helmbold commented 8 years ago

I've written a little tool that removes the unnecessary level of nesting from the table of contents. You can run it, after Hugo has generated the contents in the "public" folder.

DavidCRivera commented 8 years ago

I'm running into an issue that may be related. It seems that when I have a lower level tag i.e. an H6 appearing first in my content before an H5, a TOC is rendered at the start of my content, without my explicitly including a .TableOfContents tag.

What I mean is, if my content looks like:

##### This is an H5 #####

###### This is an H6 ######

Then all is right with the world, and no TOC gets generated. But if it looks like this:

###### This is an H6 ######

##### This is an H5 #####

Then I get the garbled TOC HTML appearing arbitrarily at the start of my content. It doesn't even have the "TableOfContents" ID; it's just a naked NAV tag. Seems like a bug...

bep commented 7 years ago

This issue has been automatically marked as stale because it has not been commented on for at least four months.

The resources of the Hugo team are limited, and so we are asking for your help.

If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open.

If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.

This issue will automatically be closed in four months if no further activity occurs. Thank you for all your contributions.

bep commented 7 years ago

Note/Update: This issue is marked as stale, and I may have said something earlier about "opening a thread on the discussion forum". Please don't.

If this is a bug and you can still reproduce this error on the latest release or the master branch, please reply with all of the information you have about it in order to keep the issue open.

If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.

helmbold commented 7 years ago

This issue is still unresolved.

thewebmastercom commented 7 years ago

Any news on this? It appears to be still unresolved.

mikeblum commented 7 years ago

I found this open issue when trying to figure out why Hugo's {{ .TableOfContents }} didn't seem to work properly / wasn't styleable. I created a partial for generating TOC trees based on header tags. The general philosophy being that TOCs need to be customisable so I figured that'd work best with a partial. This is more of a way to render headers within a .Content block rather than a formal data structure.

snippet from partials/table-of-contents.html:

 <!-- ignore empty links with + -->
{{ $headers := findRE "<h[1-6].*?>(.|\n])+?</h[1-6]>" .Content }}
<!-- at least one header to link to -->
{{ $has_headers := ge (len $headers) 1 }}
<!-- a post can explicitly disable Table of Contents with toc: false -->
{{ $show_toc := (eq $.Params.toc true) }}
{{ if and $has_headers $show_toc }}
<div class="table-of-contents toc bd-callout">
    <!-- TOC header -->
    <h4 class="text-muted">Table of Contents</h4>
    {{ range $headers }}
        {{ $header := . }}
        {{ range first 1 (findRE "<h[1-6]" $header 1) }}
            {{ range findRE "[1-6]" . 1 }}
                {{ $next_heading := (int .) }}
                <!-- generate li array of the proper depth -->
                {{ range seq $next_heading }}
                    <ul class="toc-h{{ . }}">
                {{end}}
                {{ $base := ($.Page.File.LogicalName) }}
                {{ $anchorId := ($header | plainify | htmlEscape | urlize) }}
                {{ $href := delimit (slice $base $anchorId) "#" | string }}
                <a href="{{ relref $.Page $href }}">
                    <li>{{ $header | plainify | htmlEscape }}</li>
                </a>
                <!-- close list -->
                {{ range seq $next_heading }}
                    </ul>
                {{end}}
            {{end}}
        {{end}}
    {{ end }}
</div>
{{ end }}

Here is how I have it working to render TOCs for Posts:

 <div class="content">
          {{ partial "banner" . }}
          {{ partial "table-of-contents" . }}
          <!-- supports emoji -->
          {{ .Content | emojify }}
</div>

jirfag commented 7 years ago

@mikeblum thank you for this snippet! I used it and made bootstrap-styled table of contents, my snippet is here

vassudanagunta commented 7 years ago

Once we have #1778, we can more easily provide the TOC as a data structure, using access to the syntax tree, or perhaps writing a new renderer to build during parsing.

vassudanagunta commented 6 years ago

Once we have #1778

I meant to say, once we have #3949 (Upgrade to Blackfriday v2)...

lb13 commented 6 years ago

@mikeblum thank you!

alexislg2 commented 6 years ago

@mikeblum thanks a lot! I still have an issue with the anchor link. This ($header | plainify | htmlEscape | urlize) does not work with several cases. Examples:

Bonjour, ca va ? shoud return bonjour-ca-va but returns bonjour-ca-va- (note the hyphen at the end)

Also, it does not work with apostrophes. Both href and title do not work. for example let's go gives letamprsquos-go

I am not a go expert so I cannot help.

branw commented 6 years ago

@alexislg2 The last two functions are incorrect. plainify returns a string that is already escaped, so we have to htmlUnescape it. Furthermore, anchors are generated using the anchorize function (a BlackFriday provided feature), not urlize.

Here are the relevant changes to get the partial @mikeblum posted working correctly:

{{ $anchorId := ($header | plainify | htmlUnescape | anchorize) }}
{{ $href := delimit (slice $base $anchorId) "#" | string }}
<li><a href="{{ relref $.Page $href }}">
    {{ $header | plainify | htmlUnescape }}
</a></li>

skyzyx commented 6 years ago

I implemented the toc as a partial using code from above, but the logic of the code produced markup that was invalid and not semantically sound. So I rewrote it like so:

https://gist.github.com/skyzyx/a796d66f6a124f057f3374eff0b3f99a

This version intentionally only looks for h2…h4. This is because the page title is the h1, and everything else is h2 or below. I also choose to stop at h4 because the value to the reader beyond that is — in my experience — negligible.

Feel free to re-adjust the regexes if you want a broader spectrum of headers.

yihui commented 6 years ago

In case any one is interested, I just wrote a short JS script to remove the non-existent h1 in TOC so that it can start from h2 instead: https://github.com/yihui/misc.js/blob/main/js/fix-toc.js One advantage of this solution is that it does not assume whether your TOC starts from h1 or h2.

You can include the script via something like <script src="/js/fix-toc.js></script> after you put it under the static/js/ directory of your site.

Here is an example. If your eyes are quick enough, you can actually see the first <ul> in TOC quickly removed :)

VincentTam commented 6 years ago

@skyzyx Thanks for sharing. I'm implement this in my Hugo blog on GitLab under /pages/*/index.md, but this generates an unordered list of links pointing to /post/*/index.md. I believe I'll probably end up with errors similar to those in my recent failed job.

@yihui Thanks for your JavaScript, even though it works only if there's more than one section. I've adapted your script to Beautiful Hugo and published it on GitLab snippet.

// Copyright (c) 2017 Yihui Xie & 2018 Vincent Tam under MIT

(function() {
  var toc = document.getElementById('TableOfContents');
  if (!toc) return;
  do {
    var li, ul = toc.querySelector('ul');
    if (ul.childElementCount !== 1) break;
    li = ul.firstElementChild;
    if (li.tagName !== 'LI') break;
    // remove <ul><li></li></ul> where only <ul> only contains one <li>
    ul.outerHTML = li.innerHTML;
  } while (toc.childElementCount >= 1);
})();

ghost commented 6 years ago

I implemented the toc as a partial using code from above, but the logic of the code produced markup that was invalid and not semantically sound. So I rewrote it like so:

https://gist.github.com/skyzyx/a796d66f6a124f057f3374eff0b3f99a

This version intentionally only looks for h2…h4. This is because the page title is the h1, and everything else is h2 or below. I also choose to stop at h4 because the value to the reader beyond that is — in my experience — negligible.

Feel free to re-adjust the regexes if you want a broader spectrum of headers.

This works a treat @skyzyx 😃

Beej126 commented 6 years ago

here's another twist on collapsing empty heading levels using hugo static generation vs recurring javascript overhead... the gist is to use string replace to target the empty levels... there's no conditional looping in hugo templates yet so i just applied the basic approach 3 times which covers all my markup scenarios

note the pattern on closing tag carriage returns is slightly different than opening tags

           {{ $toc := .TableOfContents }}
            {{ $toc := (replace $toc "<ul>\n<li>\n<ul>" "<ul>") }}
            {{ $toc := (replace $toc "<ul>\n<li>\n<ul>" "<ul>") }}
            {{ $toc := (replace $toc "<ul>\n<li>\n<ul>" "<ul>") }}
            {{ $toc := (replace $toc "</ul></li>\n</ul>" "</ul>") }}
            {{ $toc := (replace $toc "</ul></li>\n</ul>" "</ul>") }}
            {{ $toc := (replace $toc "</ul></li>\n</ul>" "</ul>") }}
            <!-- count the number of remaining li tags -->
            <!-- and only display ToC if more than 1, otherwise why bother -->
            {{ if gt (len (split $toc "<li>")) 2 }}
              {{ safeHTML $toc }}
            {{ end }}
          {{ end }}

VincentTam commented 6 years ago

@Beej126 Thanks for your code. :smile: That's much better than the JavaScript approach. I've tested this for my blog and it works perfectly.

ghost commented 6 years ago

@Beej126 @VincentTam Looks like there might be potential for a loop there given that the same commands are run three times each...

VincentTam commented 6 years ago

@ryanwhocodes That's more elegant, but that won't save you any line because the beginning and the end take two lines.

Beej126 commented 6 years ago

@ryanwhocodes - it seems like the ideal loop would be a conditional check on whether the last replace had any hits... but we only get a "range" style looping in hugo so far... i.e. finite list iteration... which suggests a "split" approach to generate array... but i couldn't think of a good pattern to split on that would be reliable with nested ul-li nodes... if you can see a good strategy please suggest

mithuns commented 6 years ago

So, is it supported now out of the box without modifying any partials ? Can someone please post what property to turn this on for a post md file ?

skyzyx commented 6 years ago

@helmbold, please no passive-aggressive comments. Nobody owes you (or anybody else) anything.

This is open-source software. If you want this feature so badly, why not offer to sponsor development with cash? Or contribute, yourself?

As a maintainer of a very popular piece of OSS software, I can speak first-hand about the difficulty of trying to handle development and support of OSS, on top of a daytime job + family time + time for myself.

Please, straighten out your perspective.

helmbold commented 6 years ago

@skyzyx Yes, you're right! I've deleted my comment since I would not like to read such comments in my own project.

pyrrho commented 6 years ago

The code that @mikeblum and @skyzyx provided got me pretty far, but I was bitten by a pathological case; one of my pages had numerous headings that contained the same text (Don't ask. It's... it's this whole big thing), and so generated the same id when passed through | plainify | htmlUnescape. The generated ToC only anchored to the first of these redundant headings.

Blackfriday has a workaround for this case already. It appends a counter to the end of the id for each identically-named heading, so rather than having five my-annoying-heading ids, it will generate my-annoying-heading, my-annoying-heading-2, my-annoying-heading-3, etc..

Long story short, I re-tooled the code already shared in this thread to extract the id from the headings rather than re-generate it from the contained text, and to be a bit more verbose about the sub-\

Hope it helps. https://gist.github.com/pyrrho/1d77cdb98ba58c7547f2cdb3fb325c62

Edit [20 Nov]: @mikeblum's question about explicit heading IDs made me realize my code was deficient when the headings were anything but text. I've expanded the code linked above with a substantially more complex test set, and the ability to correctly translate markdown syntax (e.g. **strong**, _em_, [links]()), html (e.g. \<span style"color: red;">explicit blocks\), emoji, and the like to the generated <ul>.

xenophenes commented 6 years ago

@pyrrho Your solution has come the closest to working for me, but I run into the errors:

error calling partial: template: theme/partials/toc.html:28:53: executing "theme/partials/toc.html" at <after 1>: error calling after: no items left

I'm reading through & trying to understand the code and why "after 1" would be failing - any ideas?

pyrrho commented 6 years ago

To immediately, answer you question, @xenophenes, no. I have no idea what that message is suggesting. I'd love to dig into it and try and make this snippet more robust, though. I'd ask we move that discussion to the gist, though, so we don't conflate the discussion in this issue with debugging back-and-forth. And so I have a record there of what broke. And (hopefully) how it was fixed.

mikeblum commented 5 years ago

After taking into account @branw 's changes (thanks by the way!) I've found some issues with how Hugo auto-generates the header ids in BlackFriday:

TOC:

<a href="/post/table-of-contents/#no-entry-sign-headers">
    🚫 headers
</a>

target header:

<h1 id="not-supported">🚫 headers</h1>

I tweaked @branw's fix to at least cosmetically support emoji:

{{ $base := ($.Page.File.LogicalName) }}
{{ $anchorId := ($header | plainify | htmlUnescape | anchorize) }}
{{ $href := delimit (slice $base $anchorId) "#" | string }}
<li>
  <a href="{{ relref $.Page $href }}">
    {{ $header | plainify | htmlUnescape | emojify }}
  </a>
</li>

and tried adding this to my config.toml:

[blackfriday]
  angledQuotes = true
  extensions = ["hardLineBreak"]
  fractions = false
  plainIDAnchors = true

but still no dice on supporting complex headers with UTF-8 nonsense in them. Is there a hook in the processing pipeline to create manual header ids? Ideally I think having the id be generated with

{{ $anchorId := ($header | plainify | htmlUnescape | anchorize) }}

would work nicely but I'm sure there are edge cases that that doesn't take into account.

pyrrho commented 5 years ago

@mikeblum it looks to me like Blackfriday is stripping the UTF8( / emoji) from the generated IDs, same as it strips special ASCII characters (&, %, $, etc.);

Input Markdown

## 🚫 headers &&
## 🚫 headers &&
## 🚫 headers &&

Output HTML

<h2 id="headers">🚫 headers &amp;&amp;</h2>
<h2 id="headers-1">🚫 headers &amp;&amp;</h2>
<h2 id="headers-2">🚫 headers &amp;&amp;</h2>

There is extended-markdown syntax for explicitly setting a heading's id, by the by; Input Markdown

## 🚫 headers {#customized-no-entry-sign-header}

Output HTML

<h2 id="customized-no-entry-sign-header">🚫 headers</h2>

percygrunwald commented 5 years ago

This thread was really helpful for me. I also created a partial to generate a table of contents for h2 ~ h4:

https://gist.github.com/percygrunwald/043e577beb90db72e09727a3ed3053c3

I commented this one pretty heavily because I had a hard time figuring out what was going on, so it might be useful for someone not that familiar with Hugo's templating syntax. The reason I made my own is that I found the output HTML for some of the examples here was not valid (too many or too few closing tags), which caused problems when the HTML was minified.

Here's a quick preview of the outcome:

And you can see it live here.

travismiller commented 5 years ago

Something not mentioned yet is a CSS only approach. It's not necessarily semantic, but has worked for my needs.

#TableOfContents > ul {
    list-style: none;
    margin: 0;
    padding: 0;
}

askmrsinh commented 5 years ago

I stumbled across this issue today. This is the solution that I am currently using to get rid of the empty top-level <li> (ie. when there is no h1 tag in the {{.Content}} portion:

{{ $emtLiPtrn := "(?s)<ul>\\s<li>\\s<ul>(.*)</li>\\s</ul>" }}
{{ $rplcEmtLi := "<ul>$1" }}
{{ .TableOfContents | replaceRE $emtLiPtrn $rplcEmtLi | safeHTML }}

It is by no means perfect but gets the job done without any JS or too many lines of code. The only issue I came across with this is when lower level heading tags (eg. h6) appear before the higher level tags (eg. h5-h1) or if I skip a heading level in my {{.Content}}. However this by itself is not very common.

helmbold commented 5 years ago

Since this bug is still not fixed, I've found a simple fix with CSS.

The following code hides the first, empty list item that leads to an orphaned bullet point:

#TableOfContents>ul {
  padding: 0;
}

#TableOfContents>ul>li{
  list-style: none;
}

koddr commented 5 years ago

Small tune of @helmbold solution (for template hello-fiends-ng):

#TableOfContents > ul {
  padding: 0;
+  margin-left: 0;
}

#TableOfContents > ul > li {
  list-style: none;
}

HenrySkup commented 5 years ago

here's another twist on collapsing empty...

this works (with limited testing) -- no need to worry about <h1>s or CSS or anything like that

{{- $toc := .TableOfContents -}}
{{- $toc := replaceRE `<ul>\n<li>\n<ul>` `<ul>` $toc -}}
{{- safeHTML $toc -}}

bep commented 5 years ago

I'm reopening this as I'm implementing ToC for Goldmark (the new and improved MD renderer in next Hugo).

bep commented 5 years ago

I suggest that we fix this in a non-magic way and add some settings for toc, e.g. startLevel (inclusive, default 2) and stopLevel (inclusive, default 3)

github-actions[bot] commented 2 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

gohugoio / hugo

Heading levels in Markdown table of contents #1778