dart-lang / markdown

A Dart markdown library
https://pub.dev/packages/markdown
BSD 3-Clause "New" or "Revised" License
451 stars 201 forks source link

Anchor links for header with link are broken on pub.dev #563

Open SandroMaglione opened 1 year ago

SandroMaglione commented 1 year ago

On the fpdart pub.dev page the anchor to some headers are broken.

The broken headers include a link:

### [Task](/packages/fpdart/lib/src/task.dart)

On the Github repository the anchor works correctly.

On pub.dev instead the anchor id added on click #task does not work. Instead, the correct id should be #taskpackagesfpdartlibsrctaskdart.

Issue on fpdart's repository

lrhn commented 1 year ago

TL;DR: Seems like a bug in the const HeaderWithIdSyntax() extension. It doesn't convert the content to text before creating the ID, so the resulting ID contains text that doesn't appear visibly in the header. And it differs from what GitHub does.

The source code for the link to the header, in packages/fpdart/README.md, is

  - [Task](#task)

In CommonMark syntax, that's an external link to #task, which means clicking it work like navigating to <currentUrl>#task.

GitHub flavored markdown (GFM) makes it work as a link to the internal header with test "task", which it gives a name/id of "task". (Or something to that effect, using scripting.)

The target is

### [Task](/packages/fpdart/lib/src/task.dart)

So, this sounds like a bug in the GFM-web extension of package:markdown, which generates the wrong ID. The ID should be based on the ASCII text content of the header, not the source. Links should be removed. (Image links are apparently entirely removed, which can cause the link to contain a -- sequence.)

The generated link of #taskpackagefpdartlibsrctaskdart has taken all the words of the header source, but shouldn't have included the words inside the (...).

Example of what GFM does:

# TOC

* Goto [simple header](#a-simple-header)
* Goto [text header](#a-link-text-header)
* Goto [text and image header](#a-link-text--header)
* Goto [silly header](#a-link---with-1--multiple%CC%81-spaces_and-2--int%C3%A9rnal-punctuation-and-3--html--face-header)

### A simple header

Simple, not?

### A [Link text](http://example.com) header

Text, not?

### A [Link text](http://example.com) ![with image](http://example.com/favicon.gif) header

More text?

###  &#x41; link - __with__ (1)  multiple&#x0301;-**spaces**_and (2)  _int&eacute;rnal_-p/u/nc+tua!tion <sup>*and*</sup><a name="xx"/> ($3)  <span color="red">html 😝 face</span> header   

Silly, yes!

So, algorithm seems to be, something like, for a header line #+ (.*), extract an ID from the (.*) as:

That's not a universal rule, it's GitHub specific, but that's what we should assume for the README.md of a pub package, especially if it has a Github repo, but probably in general. (For example, Gerrit seems to use a different algorithm, whose description is also silent on internal markdown. Some strategies remove accents from letters, GitHub does not.)

kevmoo commented 1 year ago

@srawlins ?