Closed Zamiell closed 1 year ago
This is an MDX issue. You can try here: https://mdx-git-renovate-babel-monorepo-mdx.vercel.app/playground/ It generates something like:
<h2>{`a`}{`_`}{`b`}{`_`}{`c`}</h2>
Which is multiple text nodes, and when you transform it to HTML, this also results in multiple text nodes.
We have a very similar issue here: https://github.com/facebook/docusaurus/issues/8617
I personally do not see a way we can fix this, and believe it should be a crawler bug. cc @shortcuts
I realized today that Prettier is smart enough to remove unnecessary escape characters when formatting Markdown files.
Thus, a solution for my use-case is to insert Prettier into my Docusaurus pipeline. (In other words, I ensure that output from TypeDoc is formatted with Prettier before feeding it to Docusaurus.)
Now, I no longer get the broken up nodes.
I personally do not see a way we can fix this
I'll close the issue then for now, thanks Josh.
Note MDX 2 doesn't seem to create multiple text nodes anymore, so Docusaurus v3 might fix it
Have you read the Contributing Guidelines on issues?
Prerequisites
npm run clear
oryarn clear
command.rm -rf node_modules yarn.lock package-lock.json
and re-installing packages.Description
I have some Markdown content like this:
For reference, this content was generated by TypeDoc, a popular documentation generation tool. TypeDoc puts escape characters before underscores, because underscores have semantic meaning in Markdown - they transform the text to be either bold or italic. So I would consider this escaping behavior to be "correct" from TypeDoc.
I feed this Markdown content to Docusaurus, and it creates a website for me. The resulting HTML looks like this:
This is strange, and appears to be a bug. I would naively expect that this element should instead simply be
TEAR_FALLING_SPEED
.Presumably, this behavior is an artifact of having the escape characters. Visually, the webpage looks fine, as the end user is not able to tell that the text is not actually contiguous. However, when scraping the website with the Algolia/Typesense scraper, it chokes on this content and is not able to index it properly. Thus, when a user searches for "TEAR_FALLING_SPEED", there are no matches, because all the indexer saw was "TEAR", "FALLING", and "SPEED".
As previously mentioned, underscores carry special semantic meaning in Markdown content, but they do not carry any special semantic meaning in HTML content. Thus, I suspect that Docusaurus is doing too much here. Instead of breaking up the content into multi-tokens, it should be able to simply see that there is an unnecessary escape before an underscore, and then remove it.
Your environment
Self-service