facebook / docusaurus

Easy to maintain open source documentation websites.
https://docusaurus.io
MIT License
56.69k stars 8.53k forks source link

Wrong markdown prevents link to get parsed correctly #9045

Closed alexandercerutti closed 1 year ago

alexandercerutti commented 1 year ago

Have you read the Contributing Guidelines on issues?

Prerequisites

Description

Hello there, thanks for Docusaurus! This is really great. Yesterday I pushed a version tag which triggered a pipeline to publish documentation on Github Pages.

The build suddenly started failing without the docs having been touched.

On both CI and local, the error reported what follows:

Please check the pages of your site in the list below, and make sure you don't reference any path that does not exist.
Note: it's possible to ignore broken links with the 'onBrokenLinks' Docusaurus configuration, and let the build pass.

Exhaustive list of all broken links found:

- On source page path = /Trackers/MyService2/:
   -> linking to ../../session-data.md (resolved as: /session-data.md)

Note that my configuration includes these two properties:

{
    ...
    onBrokenLinks: "throw",
    onBrokenMarkdownLinks: "throw",
    ...
}

When running docusaurus start, it wasn't returning the error but the link wasn't being resolved correctly: I could see the link pointing to raw ../../session-data.md instead of /session-data. So obliviously it was failing.

What's weird is that the link was existing and other pages on the same tree had the same link working.

My folder tree is this one:

.
├── Trackers
│   ├── MyService1
│   │   └── README.md
│   ├── MyService2
│   │   └── README.md
│   ├── MyService3
│   │   └── README.md
│   ├── MyService4
│   │   └── README.md
│   ├── add-a-tracker.md
│   └── README.md
├── intro.md
├── errors-reference.md
└── session-data.md

So I started to look back in git to see when the docs was edited the last time but at a first look I didn't see anything wrong (but actually there was something wrong). As I forgot to push the previous version tag, I didn't see the error earlier.

So I started debugging the project by running it with NODE_OPTIONS="--inspect-brk" but after several hours of debugging, I definitely failed to understand the whole picture and why it was failing.

Then I found out there was a markdown syntax error only on that specific page: a set of ``` that wasn't getting closed (as you can see from the repro case below, on this line).

So, it is fine to me that there was an error there so it failed compiling. What's odd to me is the error message and the fact that the compiling process reached such point without giving any hint about wrong markdown issue.

Another odd thing to me is that it wasn't actually a real syntax error because the html box can get generated without any problem (just a UI issue to me).

Clearly, if the link is put before the syntax error, it gets resolved correctly.

I don't know if this is an issue about docusaurus or a dependency, but I wasn't able to go any further in my investigation.

Thank you!

Reproducible demo

https://stackblitz.com/edit/github-qtdreu?file=docs%2FMyService%2FREADME.md

Please note that I had to disable on stackblitz the auto-formatting on save.

Steps to reproduce

  1. Create two pages. The second should have a link to the first one
  2. Insert in the second page, before the link, a wrong markdown (see the repro case, it might be a problem with preformatted code block)

Expected behavior

A better error message or at least links getting resolved correctly, but according to what I wrote above, I don't actually know what should I really expect.

Actual behavior

A completely unrelated error message:

Please check the pages of your site in the list below, and make sure you don't reference any path that does not exist.
Note: it's possible to ignore broken links with the 'onBrokenLinks' Docusaurus configuration, and let the build pass.

Exhaustive list of all broken links found:

- On source page path = /Trackers/MyService2/:
   -> linking to ../../session-data.md (resolved as: /session-data.md)

Your environment

Self-service

Josh-Cena commented 1 year ago

Your repro contains unpaired fences:

## Import via CDN

```html
<script src="https://myservice-core.umd.js"></script>
<script src="https://myservice.umd.js"></script>

This script is also available for ES5:

```html
<script src="https://myservice.umd.es5.js"></script>
```

## Activation and deactivation

This service gets automatically triggeredwhen [session data](../session-data.md) gets updated and certain conditions are honored.

We may be able to fix this, but it's not obligatory for us to handle non-well-formed Markdown.

alexandercerutti commented 1 year ago

Hey @Josh-Cena, thanks for prompt reply!

What's actually weird is that this seems to be a valid Markdown on the HTML, but that doesn't explain to me the connection with link resolution.

It is okay for me if your don't handle the Markdown error, but I feel like there must be something else wrong somewhere in the middle, that prevents the link to get resolved if this situation happens.

Josh-Cena commented 1 year ago

I'm preparing a PR already. Note that Markdown is very resilient: whatever the input is, the parser almost never throws a syntax error. The problem here is that Docusaurus believes the link is inside a code block (because it has encountered three code fences so far, so what comes next is probably code), so we don't replace the text, while to the actual Markdown parser, the second fence actually does not close the first fence.

alexandercerutti commented 1 year ago

So, if I understood correctly, there are two "actors" that read the parsed markdown structure (I expect to be there a metastructure - for tokens - somewhere if the parsing phase happens once, or two different markdown parsers) and handle it in two different ways?

An actor converts markdown to html, the other checks the links and other things for docusaurus, right?

Thanks for the PR!

Josh-Cena commented 1 year ago

Yes, you are right! Docusaurus actually has its own way of parsing Markdown links that does not use a full-blown Markdown parser. It's more performant but also means lots of bugs and inconsistencies. Sorry if it wasn't obvious previously.

alexandercerutti commented 1 year ago

Ugh, I hate these situations ahah Having two "flows" that do almost the same thing without a solution that could satisfy both, is something that code-smells to me!

Anyway, IMHO since docusaurus is a tool I'd probably sacrifice performance for correctness, but I'd also have just a parser.

Anyway, thank your for your support in triaging this issue 😄

Josh-Cena commented 1 year ago

We've tried using a proper parser before, but it didn't always work because we can't use the global Markdown configuration here. It is planned though. V3 will be about MDX v2, and then we may investigate further for a better Markdown infrastructure.

slorber commented 1 year ago

Helpful issue to track for a better md linkification process: https://github.com/facebook/docusaurus/issues/9048