Baroshem / nuxt-security

🛡 Automatically configure your app to follow OWASP security patterns and principles by using HTTP Headers and Middleware
https://nuxt-security.vercel.app/
MIT License
738 stars 56 forks source link

Integrate option to disable Cheerio decodeEntities #393

Closed robert-gruner closed 3 months ago

robert-gruner commented 3 months ago

Hi everybody,

first of all big thanks for this high quality Nuxt module! We have integrated it successfully into a project where we build a news website.

Problem

Our SEO team has informed us, that since we enabled some Security features (CSP), our server side generated HTML contains altered encoding of certain special characters (e.g. German Umlauts like ä,ö,ü). So we have a difference in SSR HTML and the one which is in the browser.

I debugged a bit and found out that this module uses Cheerio HTML Parsing under the hood. The "problematic" part is https://github.com/Baroshem/nuxt-security/blob/2172055bd282e9b1b19fa033cbecffdd5d66a996/src/runtime/nitro/plugins/02a-preprocessHtml.ts#L22

If we change these options to include decodeEntities: false we get the previous behaviour back.

Proposition

I understand that it might be a good idea to use the default settings and not change it for all the users. But we would like to include an option somehow to toggle/expose this setting.

Questions

If you need some more info let me know. I could also try to implement the change on my own if you give me some expecation about the config/API changes.

All the best, Robert ✌️

vejja commented 3 months ago

Hi @robert-gruner

I read that thread about this issue: https://github.com/cheeriojs/cheerio/issues/1198. Long story short, it looks like decodeEntities: false would be the right setup. So I believe that we should change the option.

If you want to submit a PR feel free to do so. It would be great if you can include a test case so that we can check that it doesn't break existing sites. Something along the lines of the example in the discussion I was mentioning would be perfect :

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Document</title>
  </head>
  <body>
    <br />
    <pre>
      // 哈哈哈
      // swift generic
      var a5 = MemoryLayout&lt;UInt&gt;.size
    </pre>
  </body>
</html>

You can also include some German umlauts to verify it addresses your issues

vejja commented 3 months ago

Update The more I think about it, the more I believe decodeEntities: true was actually horribly wrong.

Thanks @robert-gruner for spotting this. We should definitely fix.

robert-gruner commented 3 months ago

Thanks for the ultra fast responses @vejja! ⚡️ So the PR should "just" change the usage of Cheerio and add tests right?

Since I am new to this repo setup might take a while and because of the project situation it is not very probable that I will be able to contribute this week. So if you are faster feel free, I can support with code review/testing.

Baroshem commented 3 months ago

Hey @robert-gruner

Thanks for this feature idea!

After investigation from @vejja I think this could be really useful to add for the module!

@vejja would you be able to do it this week or should we wait for Robert next week?

I am up for jumping in working on it as well as I see a lot of value in it :)

vejja commented 3 months ago

@robert-gruner : just quick-fixed it on https://github.com/Baroshem/nuxt-security/tree/393-integrate-option-to-disable-cheerio-decodeentities Would you be able to test the branch there on your sources to confirm it fixes the umlaut issue ?

vejja commented 3 months ago

Hey @robert-gruner

Thanks for this feature idea!

After investigation from @vejja I think this could be really useful to add for the module!

@vejja would you be able to do it this week or should we wait for Robert next week?

I am up for jumping in working on it as well as I see a lot of value in it :)

@baroshem I just patched decodeEntities with default false as I actually don't think it's a good idea to leave it as true.

robert-gruner commented 3 months ago

@robert-gruner : just quick-fixed it on https://github.com/Baroshem/nuxt-security/tree/393-integrate-option-to-disable-cheerio-decodeentities Would you be able to test the branch there on your sources to confirm it fixes the umlaut issue ?

Yes! I can confirm that this setting solves our issue. All parts of the HTML are affected by the way. So even in Head Meta Tags entities are not hex encoded any more.

Just an example: with decodeEntities: false for example we have such meta tags

<meta property="og:description" content="Die Meerestemperatur rund um das größte Korallenriff ist derzeit überdurchschnittlich hoch. Das setzt die Korallen unter „Stress“ und könnte zu einer Bleiche führen – der vierten in nur sechs Jahren.">

decodeEntities: true would lead to

<meta property="og:description" content="Die Meerestemperatur rund um das gr&#xf6;&#xdf;te Korallenriff ist derzeit &#xfc;berdurchschnittlich hoch. Das setzt die Korallen unter &#x201e;Stress&#x201c; und k&#xf6;nnte zu einer Bleiche f&#xfc;hren &#x2013; der vierten in nur sechs Jahren.">
Baroshem commented 3 months ago

Awesome guys, thanks!

I will publish a patch version with this fix :)

Baroshem commented 3 months ago

Released in 1.2.2. Thanks guys! 🚀