facebook / docusaurus

Easy to maintain open source documentation websites.
https://docusaurus.io
MIT License
56.57k stars 8.49k forks source link

Script attribute value with quotes is escaped #10172

Closed aSemy closed 5 months ago

aSemy commented 5 months ago

removed

dfinster commented 5 months ago

I'm also running into this broken JSON object when using Cloudflare Analytics on Docusaurus 3.3.2. Unfortunately, I can't share our source code until the site is made public later this year.

It seems strange that I only find one other person (@aSemy) with this issue when I search for instances in docusaurus.config.ts. It seems like it would be more common than that. Maybe Cloudflare Analytics isn't as popular as I thought, or maybe people are using automatic mode, which can directly inject the beacon for sites proxied through Cloudflare.

Other attempts

I found a few more examples when expanding the search from Typescript to Javascript (docusaurus.config.js). Some people ignored the issue and published sites with broken JSON objects. Other people tried various approaches to fix it:

Things I tried

I tried escaping with backticks (point 1 above) because it was a quick test. I can confirm it doesn't work. I also tried swizzling (wrap mode) the footer component to insert the link:

// theme/footer/index.tsx

import React from 'react';
import Footer from '@theme-original/Footer';
import type FooterType from '@theme/Footer';
import type { WrapperProps } from '@docusaurus/types';

type Props = WrapperProps<typeof FooterType>;

export default function FooterWrapper(props: Props): JSX.Element {
  return (
    <>
      <Footer {...props} />
      <script
        defer
        src='https://static.cloudflareinsights.com/beacon.min.js'
        data-cf-beacon='{"token": "00000000000000000redacted0000000"}'
      ></script>
    </>
  );
}

This swizzle attempt didn't work.

My conclusion

At this point, I think the "least worst" option for hosting the beacon script in Docusaurus is @aSemy's workaround. I'll keep monitoring this issue and hope there's a cleaner solution someday.

Other options might be:

Cloudflare's automatic injection might be how most people do it. 🤷🏻

slorber commented 5 months ago

It is possible there is a bug in the way we escape HTML attribute values. It is quite old historical legacy string concat code, and that's not related to ETA https://github.com/facebook/docusaurus/blob/61f71f6b69a36350dc0c28fb291dfc591904ee27/packages/docusaurus/src/server/htmlTags.ts#L69

However, are we even sure the output of Docusaurus is wrong.


<!doctype html>
<html lang="en" dir="ltr" class="plugin-pages plugin-id-default" data-has-hydrated="false">
<head>
  <meta charset="UTF-8">
  <meta name="generator" content="Docusaurus v3.3.2">
  <script 
    id="testscript" 
    defer="defer" 
    src="https://static.cloudflareinsights.com/beacon.min.js" 
    data-cf-beacon="{&quot;token&quot;:&quot;1234&quot;}"></script>
<head>
<body>body</body>
</html>

For browsers, the following HTML looks valid and gets decoded properly to a JSON value.

CleanShot 2024-05-30 at 12 27 36

CleanShot 2024-05-30 at 12 28 11

Can you prove that the output of Docusaurus is wrong, and that it's not a bug in the script you load?

Is there anything written in the HTML spec mentioning that &quot; encoding is invalid as an attribute?

Please show me the problem outside of Docusaurus:

TLDR I need a minimal repro, probably not even involving Docusaurus

aSemy commented 5 months ago

removed

slorber commented 5 months ago

When the attribute quotes are escaped as ", I get zero hits in Cloudflare Web Analytics. After I applied a workaround, I see hits in Cloudflare.

What I mean is: how can you be sure Docusaurus is the problem, and not Cloudflare Web Analytics?

I believe what we do is fine, and that the bug might be somewhere else.

Yes, I'm asking you to troubleshoot the script Cloudflare provides you.

https://gist.github.com/slorber/bec62fcfb278d5a569e2d2c9c716504a

CleanShot 2024-05-30 at 15 46 59

CleanShot 2024-05-30 at 15 49 34

There might be other places in that script that break, but the code above found in the script seems to work fine.

If something is broken, please tell me what exactly by isolating things. You shouldn't need to depend on Docusaurus nor a third-party cloudflare script to prove your point, and a simple HTML file should be enough to reproduce the problem, if there is a problem in the first place.


If the encoding we use is fine and spec-compliant, then it is unfair to ask us to change it to work around a bug found in another system.

If it is proven that our encoding is wrong, then we'll make a change. And having 0 analytics on Cloudflare does not prove that the encoding is wrong.

aSemy commented 5 months ago

Actually, never mind. I have a workaround and I don't want to engage with you any more. Please don't contact me.

dfinster commented 5 months ago

I'm sad that we've lost the comment history here, because it's a good opportunity for me to learn new things.

Namely:

  1. A technique I didn't know about for injecting code with defaultSSRTemplate.replace()
  2. The possibility that an HTML-encoded JSON object can be correctly decoded with JSON.parse().

We are talking about the difference between these two strings appearing in the rendered HTML:

I made the assumption that the second example would break the script, but I didn't thoroughly test it. @slorber suggests that it's valid. When I have time I'll make a minimum viable example to test this out. If true, then I can strike the word "possibility" above and I will have learned a new thing. That's exciting to me.

All I really know at this point is that scripts: [] in docusaurus.config.ts HTML-encodes the double quotes, as do JSX/TSX components. I think @slorber is correct to ask for a reproducible example, but it looks like (from what I remember of the now-deleted comments) that there was a misunderstanding about what was being requested and why.

Peace to you all.

slorber commented 5 months ago

The possibility that an HTML-encoded JSON object can be correctly decoded with JSON.parse().

It's not really HTML-encoded JSON, it's just JSON that is transferred as part of an HTML payload. It's not JSON.parse() that decodes it, once it's in the DOM it's already decoded.

dfinster commented 5 months ago

It's not really HTML-encoded JSON, it's just JSON that is transferred as part of an HTML payload. It's not JSON.parse() that decodes it, once it's in the DOM it's already decoded.

That clarification helps, thank you for spending time explaining it. Truly appreciated.