facebook / docusaurus

Easy to maintain open source documentation websites.
https://docusaurus.io
MIT License
56.72k stars 8.54k forks source link

onBrokenAnchors reports named anchor tags as broken links #9808

Open jrahal-mx opened 9 months ago

jrahal-mx commented 9 months ago

Have you read the Contributing Guidelines on issues?

Prerequisites

Description

When you use named anchors (E.g., <a name="anchor">Some Text</a> or self-closing ) in a doc page, onBrokenAnchors reports it as a broken link.

Reproducible demo

No response

Steps to reproduce

  1. In docusaurus.config.js, set onBrokenAnchors to Warn (if not using the default setting).
  2. Edit a doc page.
  3. Add a named anchor anywhere on the page. E.g., <a name="anchor">Some Text</a>
  4. Add a link to the named anchor, e.g., [Link Text](#anchor)
  5. Save the page.
  6. Build the site (yarn build).

Expected behavior

No console feedback. onBrokenAnchors ignores the arbitrary named anchor because it's only supposed to work for anchors declared with the Heading component.

Actual behavior

onBrokenAnchors reports the named anchor as a broken anchor. Example:

[WARNING] Docusaurus found broken anchors!

Please check the pages of your site in the list below, and make sure you don't reference any anchor that does not exist.
Note: it's possible to ignore broken anchors with the 'onBrokenAnchors' Docusaurus configuration, and let the build pass.

Exhaustive list of all broken anchors found:
- Broken anchor on source page path = /user-types-and-permissions:
   -> linking to #anchor (resolved as: /user-types-and-permissions#anchor)

Note that on the rendered page, the anchor works.

Your environment

  • Public source code:
  • Public site URL: (issue encountered locally during build)
  • Docusaurus version used: 3.1.1
  • Environment name and version (e.g. Chrome 89, Node.js 16.4): Version 120.0.6099.129 (Official Build) (arm64)
  • Operating system and version (e.g. Ubuntu 20.04.2 LTS): macOS Ventura 13.4.1 (22F82)

Self-service

  • [ ] I'd be willing to fix this bug myself.
slorber commented 9 months ago

We'll try to improve this, but this is currently the intended behavior, because all anchors must be "collected" for the broken link checker to see them.

CF-related issue: https://github.com/facebook/docusaurus/issues/9763#issuecomment-1905657358

The current suggested workaround (requiring v3.1.1) is:

import Link from '@docusaurus/Link';

<Link id="my-anchor"/>

Please note that <a name="xyz"/> is not valid anymore as per HTML5 spec, that's why I choose to not support it in our Link component. See https://stackoverflow.com/questions/484719/should-i-make-html-anchors-with-name-or-id


I'm keeping this open for now, as there's an opportunity for us to improve the DX.

But please share more details about your use case. For now, it's hard to know if you created that JSX inside a React file or an MDX file, and that matters to me to design an improved solution, for which the technical solution could be radically different (remark plugin VS custom JSX pragma).

Reading the repro steps, I kind of understand that the anchor is in an MDX doc file right? Would you expect this to also work inside a React file such as src/pages/index.js?

jrahal-mx commented 9 months ago

Yes exactly, my immediate use is to be able to have arbitrary (non-heading) anchors in MDX files. I haven't needed to use them in React pages.

I would love for onBrokenAnchors to be able to validate arbitrary anchors, along with heading anchors, in MDX pages. That's the case with the workaround you provided (Thanks!).

If onBrokenAnchors can't validate arbitrary anchors, I'd expect it to ignore them instead of mistakenly reporting them as broken.

slorber commented 9 months ago

I would love for onBrokenAnchors to be able to validate arbitrary anchors, along with heading anchors, in MDX pages. That's the case with the workaround you provided (Thanks!).

We'll try to add support for <div id="anchor"/>, <a id="anchor"/> etc in Markdown files.

However it's not possible to add support for <MyCustomComponent id="anchor"/> and you'll have to keep using the Docusaurus API in that custom component.

If onBrokenAnchors can't validate arbitrary anchors, I'd expect it to ignore them instead of mistakenly reporting them as broken.

We validate each link anchor against all the anchors we successfully extracted from the target page, so it's important to collect all of them.

For a given page, we either validate or not, but there's no middle ground. What you ask for basically means disabling the broken anchor check

jrahal-mx commented 9 months ago

We'll try to add support for <div id="anchor"/>, <a id="anchor"/> etc in Markdown files.

That would be great!

For a given page, we either validate or not, but there's no middle ground. What you ask for basically means disabling the broken anchor check

I understand. That's not what I want. I would much rather the arbitrary anchors get checked along with the heading anchors. It's just that right now, <a> anchors return false errors because they're not supported, which makes the broken anchors check noisy.

I really appreciate your taking the time to consider this, BTW.

nijikokun commented 7 months ago

Is there some way to suppress this warning, this is currently blocking us from upgrading

slorber commented 7 months ago

@nijikokun a warning does not prevent your site from building, and yes this is configurable as explained in the docs and release notes: https://docusaurus.io/blog/releases/3.1#broken-anchors-checker

onBrokenAnchors: "ignore"

Disabling it is not a big deal, it just reverts to what we had in v3.0-

But it would be better to fix the broken anchors and help us figure out when we report false positives (if it's a different case from the one above)

CalvinWilkinson commented 6 months ago

I am having the same issue but I am using MD and MDX files. I guess I just don't understand why this would be an issue when using markdown files because they get transpiled by docusaurus. Wouldn't docusaurus properly convert a markdown link to the required HTML so it "is not a broken anchor"?

For example, I have the markdown link [Width](./Velaptor.SizeU.md#width) with other variations that I have tried, but I always get the warning when building.

Here are the variations I have tried with no good results:

  1. [Width](./Velaptor.SizeU.md#width)
  2. [Width](./Velaptor.SizeU.md#Velaptor.SizeU.Width)
  3. [Width](Velaptor.SizeU.md#width)
  4. [Width](Velaptor.SizeU.md#Velaptor.SizeU.Width)
  5. [Width](Velaptor.SizeU.md#Velaptor.SizeU.Width 'Velaptor.SizeU.Width')
  6. [Width](Velaptor.SizeU.md#Width 'Velaptor.SizeU.Width')
  7. [Width](Velaptor.SizeU.md#width 'Velaptor.SizeU.Width')

Of course, this could be maybe understanding what a "broken anchor" means. Even though I get the warning, the anchor works locally with pnpm start, pnpm run serve, and in production.

So I guess I do need an explanation of what "broken anchor" means?

OzakIOne commented 6 months ago

I am having the same issue but I am using MD and MDX files. I guess I just don't understand why this would be an issue when using markdown files because they get transpiled by docusaurus. Wouldn't docusaurus properly convert a markdown link to the required HTML so it "is not a broken anchor"?

For example, I have the markdown link [Width](./Velaptor.SizeU.md#width) with other variations that I have tried, but I always get the warning when building.

Here are the variations I have tried with no good results:

  1. Width
  2. Width
  3. Width
  4. Width
  5. Width
  6. Width
  7. Width

Of course, this could be maybe understanding what a "broken anchor" means. Even though I get the warning, the anchor works locally with pnpm start, pnpm run serve, and in production.

So I guess I do need an explanation of what "broken anchor" means?

It's quite difficult to see what problem your are facing with the current information, perhaps we could help you more if you try to recreate your error with https://docusaurus.io/docs/playground

The feature basically collects each link in a page and then try to resolve the link with or without anchor and if nothing is found its considered as a broken link / anchor

I don't know how is your Velaptor.SizeU.md page but #widthshould refer to a markdown heading

// hello.md
# heading

a link to my [title](#heading)
// world.md
# heading

a link to my [hello heading](./hello#heading) // also should work depending on the path
a link to my [hello heading](hello#heading) // should work
CalvinWilkinson commented 6 months ago

I am having the same issue but I am using MD and MDX files. I guess I just don't understand why this would be an issue when using markdown files because they get transpiled by docusaurus. Wouldn't docusaurus properly convert a markdown link to the required HTML so it "is not a broken anchor"? For example, I have the markdown link [Width](./Velaptor.SizeU.md#width) with other variations that I have tried, but I always get the warning when building. Here are the variations I have tried with no good results:

  1. Width
  2. Width
  3. Width
  4. Width
  5. Width
  6. Width
  7. Width

Of course, this could be maybe understanding what a "broken anchor" means. Even though I get the warning, the anchor works locally with pnpm start, pnpm run serve, and in production. So I guess I do need an explanation of what "broken anchor" means?

It's quite difficult to see what problem your are facing with the current information, perhaps we could help you more if you try to recreate your error with https://docusaurus.io/docs/playground

The feature basically collects each link in a page and then try to resolve the link with or without anchor and if nothing is found its considered as a broken link / anchor

I don't know how is your Velaptor.SizeU.md page but #widthshould refer to a markdown heading

// hello.md
# heading

a link to my [title](#heading)
// world.md
# heading

a link to my [hello heading](./hello#heading) // also should work depending on the path
a link to my [hello heading](hello#heading) // should work

Sorry about that. I forgot to escape the markdown examples. I fixed my comment. I will also try and recreate the error in the playground.

CalvinWilkinson commented 6 months ago

@OzakIOne

I used the codebox.io playground to set up the broken link example.

codebox fork

This is the warning that I got in the terminal when running npm run build

[WARNING] Docusaurus found broken anchors!

Please check the pages of your site in the list below, and make sure you don't reference any anchor that does not exist.
Note: it's possible to ignore broken anchors with the 'onBrokenAnchors' Docusaurus configuration, and let the build pass.

Exhaustive list of all broken anchors found:
- Broken anchor on source page path = /docs/intro:
   -> linking to /docs/tutorial-basics/create-a-page#MyHeading
slorber commented 6 months ago

@CalvinWilkinson what you report here is not related to the original issue. I'm going to fold all these comments as "off topic", but feel free to open a separate issue/discussion if you want to.

I'd encourage you to join our Discord server in case you need community support on this, because what you encounter is not a bug, it's the expected behavior.


Your link is [My Heading](./tutorial-basics/create-a-page.md#MyHeading)

And the target page has # MyHeading

First, we don't create anchors for docs titles / top-level headings. You can just link to the page directly because that title should rather be at the top.

Basically, you can see it does not have a # next to it when you hover it. If you used ## MyHeading instead you would get an anchor.

CleanShot 2024-04-25 at 12 32 06@2x

If you want to change this behavior and have top-level headings create anchors, you can swizzle our @theme/Heading component.

Second, the anchors we generate are "sluggified" using the same logic GitHub uses. And anchors are case-sensitive. If you use ## My Heading then the anchor is #my-heading, not #My Heading

lachieh commented 3 months ago

Please note that is not valid anymore as per HTML5 spec, that's why I choose to not support it in our Link component.

I came across this issue because we include a bunch of generated content in our docs which includes a few anchors like this:

- <a name="item-1" /> Item 1

The tool that generates the content has a PR open to align to HTML5 spec, but in the meantime I made a quick little rehype plugin that will swap all name attributes to id instead. This is definitely a preference but it helped remove all the errors and still works as intended.

Here is the code if anyone also has this issue:

rehypeNameToId.ts:

import { Plugin } from 'unified';
import { hasProperty } from 'hast-util-has-property';
import { Nodes } from 'hast';
import { visit } from 'unist-util-visit';

const rehypeNameToId: Plugin<never> = () => {
  return (tree) => {
    visit(tree, 'element', (node: Nodes) => {
      if (hasProperty(node, 'name')) {
        node.properties.id = node.properties.name;
        node.properties.name = '';
      }
    });
  };
};

export default rehypeNameToId;

docusaurus.config.ts:

import rehypeNameToId from './rehypeNameToId.ts'

export default {
  // repeat in preset, docs-plugins, blog-plugins, etc.
  presets: [
    [
      'classic',
      {
        blog: {
          rehypePlugins: [rehypeNameToId],
        }
      },
    ]
  ]
}
anonhostpi commented 2 months ago

First, we don't create anchors for docs titles / top-level headings. You can just link to the page directly because that title should rather be at the top.

So, all h1s are not anchorable? That seems counterintuitive.

I understand not allowing anchors for the title and possibly the first h1, but all h1s? h1s aren't for titling documents, they are for formatting and navigation.

More specifically, markdown's headings are based off of headings in word documents. All top-level h1 headings in Microsoft Word are "anchored" and can be navigated to with hyperlinks in the word document. There's no reason that h1s should be treated different here.

I do agree that anchoring the first h1 is pointless, but it is also harmless.

slorber commented 2 months ago

@anonhostpi

So, all h1s are not anchorable? That seems counterintuitive.

I understand not allowing anchors for the title and possibly the first h1, but all h1s?

It's usually a best practice to only have one h1 per page

h1s aren't for titling documents, they are for formatting and navigation.

Oh really? Can you back this up with reference material from an SEO and accessibility authority? For example from Google?

More specifically, markdown's headings are based off of headings in word documents. All top-level h1 headings in Microsoft Word are "anchored" and can be navigated to with hyperlinks in the word document. There's no reason that h1s should be treated different here.

We are not a text editor, and don't particularly plan to inspire from them.

And remember h1 are not anchorable by default on the web. If you don't like that, then maybe propose to browser vendors to make all headings anchorable by default?

I do agree that anchoring the first h1 is pointless, _but it is also harmless.

If it prevents you from introducing an antipattern to your own site, we can even consider this restriction as a feature.


The good news is that Docusaurus is flexible enough to let you plug your own opinions if you don't agree with our defaults.

Josh-Cena commented 2 months ago

And remember h1 are not anchorable by default on the web.

OOC, are other headings anchorable by default? Don't they all require ids?

Also, "should multiple h1s be allowed" has been a long flame war on MDN, but the opinion of the writers is also that there's no legitimate use case of multiple h1s.

anonhostpi commented 2 months ago

We are not a text editor, and don't particularly plan to inspire from them.

And remember h1 are not anchorable by default on the web. If you don't like that, then maybe propose to browser vendors to make all headings anchorable by default?

Even if you are not a text editor, blocking anchoring doesn't follow the outline behavior specified by the HTML specification:

This is why I said headings are for navigation.

If it prevents you from introducing an antipattern to your own site, we can even consider this restriction as a feature.

I would argue that allowing multiple h1s while simultaneously not allowing anchors on secondary h1s is an antipattern itself.

anonhostpi commented 2 months ago

The HTML specification can be poor at defining semantics and often relies on the English language for that. So, to further drive my point that headings are for formatting, here is 2 definitions from the Cambridge dictionary stating that headings are for titling sections of text (not documents)

https://dictionary.cambridge.org/us/dictionary/english/heading

Josh-Cena commented 2 months ago

This is for general typesetting. It has nothing to do with web page structure. You can find numerous resources:

<h1> should be functionally equivalent to the page's <title>, only displayed to the user. Everything else comes as <h2>.

Josh-Cena commented 2 months ago

Anyway none of this really pertain to Docusaurus. This issue is about <a name="..."> which is explicitly invalid HTML. We are only keeping it open to improve the DX by recommending alternatives. <h1>s don't get IDs, and we are unlikely to differentiate between "first h1" and "later h1s" because that shouldn't matter.

anonhostpi commented 2 months ago

It does matter. h1-h6s are not only for formatting, but also for navigation. Not allowing anchoring goes outside the intended outlining behavior defined by the HTML spec. If a user creates a link to a secondary h1, that link should work. Otherwise, it's non-compliant:

The outline should be used for generating document outlines, for example when generating tables of contents. When creating an interactive table of contents, entries should jump the user to the relevant heading. - 4.3.11 Headings and outlines, HTML Specification

This also applies to links within the document.

EDIT: More to its importance, header and outline navigation is what fragment identifiers (anchor links) were originally intended for: see the examples in the older HTML4 spec

Even if we are argue that this isn't an HTML renderer, I would not only argue that I expect the h1 behavior of docusaurus to be consistent with other markdown renderers that are compliant, I would not call breaking a link to a secondary h1 "accessible" design.

However, given that multi-h1 usage isn't desired (as you have asserted), I would recommend blocking, limiting, or warning the user about multi-h1 usage, and/or notify the user that h1's are not navigable (despite the typical behavior).

slorber commented 2 months ago

I'd suggest opening a different issue/discussion for the case of h1 headings in particular.

Let's keep this issue clean related to onBrokenAnchors reporting #anchor as a broken link for <a id="anchor"/>.

I'll hide all the h1-related discussions above.