commonmark / cmark

CommonMark parsing and rendering library and program in C
Other
1.6k stars 526 forks source link

HTML table-of-contents renderer #78

Open craigbarnes opened 8 years ago

craigbarnes commented 8 years ago

I'm working on replacing libmarkdown (Discount) with libcmark in one of my projects, but progress is currently blocked by the lack of HTML table-of-contents generation in libcmark. I initially thought I could just add a new cmark_render_html_toc() function to use alongside the existing cmark_render_html(). However, hyperlinking from the ToC items to the headings also requires a link anchor for each generated heading, which cmark_render_html() currently doesn't include.

Does it sound reasonable to implement both of these changes here? If so, any thoughts on the best way to do so? Should the link anchors always be included, or should they be configurable?

jgm commented 8 years ago

The issue of link anchors is a tricky one and it has been extensively discussed here, as you probably know: https://talk.commonmark.org/t/feature-request-automatically-generated-ids-for-headers/115/10

I'm thinking about the following approach:

With this setup, you could postporcess the AST after parsing and add header identifiers, using any scheme that makes sense for your application, and you could also insert a TOC at that point. This could be done using the cmark API and the iterator interface.

What do you think? If we wanted to, we could provide default functions for doing these transformations, and include an option with the command-line tool.

A more flexible approach would be to add an attributes field to every node: this would be a linked list of attribute, string-value pairs.

None of this would require the spec to say anything in particular about header identifiers.

@nwellnhof @MathieuDuponchelle @coding-horror @vmg @gjtorikian I'd be interested in any thoughts on this too.

MathieuDuponchelle commented 8 years ago

Not sure about that, the problem with this approach is portability.

The use case for anchors most people are interested in, afaict, isn't table of contents generation, but in-document linking. If each tool uses the API in a different way to set ids, then when switching tools to render the same manually written markdown file containing links to anchors breaks.

My preferred solution would be to have a simple add-anchors option, which would generate ids guaranteed to be valid in html, like the auto_identifiers extension does in pandoc.

jgm commented 8 years ago

+++ Mathieu Duponchelle [Jun 24 16 11:23 ]:

Not sure about that, the problem with this approach is portability.

The use case for anchors most people are interested in, afaict, isn't table of contents generation, but in-document linking. If each tool uses the API in a different way to set ids, then when switching tools to render the same manually written markdown file containing links to anchors breaks.

My preferred solution would be to have a simple add-anchors option, which would generate ids guaranteed to be valid in html, like the auto_identifiers extension does in pandoc.

As you can see from the linked talk page, there are lots of differences of opinion about how the automatica links should be generated, or whethere they should be generated at all. It may be that different kinds of sites are going to need different approaches for this. And so I'm not persuaded that the spec should demand a particular method for generating automatic IDs... The pandoc method has the drawback that if you re-order sections in your document, and some sections have the same name, links may break.

The proposal here would be compatible with eventually standardizing on one way of doing this. But it would make it possible for people to add IDs now, in a way that suits their purposes, without any decision being made in the spec.

MathieuDuponchelle commented 8 years ago

The pandoc method has the drawback that if you re-order sections in your document, and some sections have the same name, links may break.

Right, but there is no perfect solution to this. Writers wanting to make sure such links don't break could use the "custom attribute syntax" once it has been agreed upon :)

westurner commented 5 years ago

GitHub generates unique IDs and anchors for CommonMark headings. How do they do it?

See:

kivikakk commented 5 years ago

It is done in a post-processing step, similarly to how we linkify @mentions (like @westurner), add issue references (#78), etc. I can detail the (very simple) algorithm for generating the anchors if you like, but in short, it's not related to CommonMark processing.

westurner commented 5 years ago

Having auto-generated tables of contents would save a lot of developer time.

"Let's just let the community make multiple incompatible implementations" has resulted in the status quo: still no autogenerated TOC for CommonMark documents.

AFAIU, one argument presented in [1] is that :

As a result:

[1] https://talk.commonmark.org/t/feature-request-automatically-generated-ids-for-headers/115

[2] https://github.com/github/markup/issues/904

How about:

ghost commented 2 years ago

A table of contents is a presentation issue. Adding a syntax to inject a ToC adds little information to the document that cannot already be derived from the document headings. Take a look at KeenWrite's PDF themes:

https://github.com/DaveJarvis/keenwrite/blob/master/docs/screenshots.md#pdf-themes

In particular, look at the upper-right corner of the following image:

https://raw.githubusercontent.com/DaveJarvis/keenwrite/master/docs/images/screenshots/08.png

That ToC (in green) was generated by first converting Markdown to XHTML using flexmark-java, then the XHTML was typeset using ConTeXt. The ConTeXt typesetting engine provides control over the ToC colours, number of levels, leader dots, font sizes, location in the document, etc.

pandoc has the same functionality for generating a ToC in HTML pages by passing the --toc command-line option.

IMO, GitHub needs to add an externally defined configuration file that instructs its Markdown parser how to generate the corresponding HTML output. This could include exporting a ToC, tweaking the heading level depth, and define variables that are interpolated.

Consider a file named .config.yaml:

---
meta:
  toc:
    insert: true
    depth: 3
application:
  name: My Super App
  version: 1.2.3

Alongside the following README.md:

# {{application.name}}

Changes to version {{application.version}} include:

* Bug fix
* Feature creep

Would render as:


My Super App

Changes to version 1.2.3 include:


What would be great to standardize is the configuration file syntax so that pandoc, GitHub, and Markdown renderers/editors could all parse the same standard metadata.

By updating .config.yaml as part of the process it ensures that the application name, version, and other build-related information have a single source of truth.

westurner commented 2 years ago

While that may all be true, by comparison all I need to do with docutils (.rst on github) is this:


.. contents::
   :depth: 5

(And it works {with standalone .rst documents viewed on GitHub, with JAMstack tools like Sphinx (also just within/for subheadings), and Jupyter-Book (MyST Markdown, Notebooks, .rst) that need to be called by a GitHub Action)

https://docutils.sourceforge.io/docs/ref/rst/directives.html#table-of-contents

https://jupyterbook.org/file-types/restructuredtext.html#including-restructuredtext-in-markdown

https://jupyterbook.org/customize/toc.html#generate-a-table-of-contents-from-content-files

FWIW, FWIU, this is the {MyST Markdown} way to call the docutils Table of Contents directive and pass arguments:

:depth: 5
:local:
:backlinks: entry
:class: css-classname

So, if {markdown, CommonMark, } were to just copy and extend those options/args/parameters/kwargs from e.g. docutils and add a citation, we would have a portable TOC: Table of Contents directive that also works in pandoc.

On Fri, Dec 3, 2021, 03:34 Dave Jarvis @.***> wrote:

A table of contents is a presentation issue. Adding a syntax to inject a ToC adds no information that cannot already be derived from the document headings. Take a look at KeenWrite's PDF themes:

https://github.com/DaveJarvis/keenwrite/blob/master/docs/screenshots.md#pdf-themes

In particular, look at the upper-right corner of the following image:

https://raw.githubusercontent.com/DaveJarvis/keenwrite/master/docs/images/screenshots/08.png

That ToC (in green) was generated by first converting Markdown to XHTML using flexmark-java, then the XHTML was typeset using ConTeXt. The ConTeXt typesetting engine provides control over the colours, number of levels, leader dots, font sizes, location in the document, etc.

pandoc has the same functionality for generating a ToC in HTML pages by passing the --toc command-line option.

IMO, GitHub needs to add an externally defined configuration file that instructs its Markdown renderer how to present the HTML page. This could include exporting a ToC, tweaking the heading level depth, and even define variables https://www.youtube.com/watch?v=u_dFd6UhdV8&t=138s that could be interpolated.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commonmark/cmark/issues/78#issuecomment-985315022, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMNS6U22TCLYUPAKFVVTLUPB6INANCNFSM4BOS2QFQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.