Closed robert-gruner closed 3 months ago
Hi @robert-gruner
decodeEntities: true
. It is the default for Cheerio and we did not change that.decodeEntities: false
, but if there were it would definitely not be a CSP issue.I read that thread about this issue: https://github.com/cheeriojs/cheerio/issues/1198.
Long story short, it looks like decodeEntities: false
would be the right setup.
So I believe that we should change the option.
If you want to submit a PR feel free to do so. It would be great if you can include a test case so that we can check that it doesn't break existing sites. Something along the lines of the example in the discussion I was mentioning would be perfect :
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Document</title>
</head>
<body>
<br />
<pre>
// 哈哈哈
// swift generic
var a5 = MemoryLayout<UInt>.size
</pre>
</body>
</html>
You can also include some German umlauts to verify it addresses your issues
Update
The more I think about it, the more I believe decodeEntities: true
was actually horribly wrong.
Thanks @robert-gruner for spotting this. We should definitely fix.
Thanks for the ultra fast responses @vejja! ⚡️ So the PR should "just" change the usage of Cheerio and add tests right?
Since I am new to this repo setup might take a while and because of the project situation it is not very probable that I will be able to contribute this week. So if you are faster feel free, I can support with code review/testing.
Hey @robert-gruner
Thanks for this feature idea!
After investigation from @vejja I think this could be really useful to add for the module!
@vejja would you be able to do it this week or should we wait for Robert next week?
I am up for jumping in working on it as well as I see a lot of value in it :)
@robert-gruner : just quick-fixed it on https://github.com/Baroshem/nuxt-security/tree/393-integrate-option-to-disable-cheerio-decodeentities Would you be able to test the branch there on your sources to confirm it fixes the umlaut issue ?
Hey @robert-gruner
Thanks for this feature idea!
After investigation from @vejja I think this could be really useful to add for the module!
@vejja would you be able to do it this week or should we wait for Robert next week?
I am up for jumping in working on it as well as I see a lot of value in it :)
@baroshem I just patched decodeEntities
with default false
as I actually don't think it's a good idea to leave it as true
.
@robert-gruner : just quick-fixed it on https://github.com/Baroshem/nuxt-security/tree/393-integrate-option-to-disable-cheerio-decodeentities Would you be able to test the branch there on your sources to confirm it fixes the umlaut issue ?
Yes! I can confirm that this setting solves our issue. All parts of the HTML are affected by the way. So even in Head Meta Tags entities are not hex encoded any more.
Just an example:
with decodeEntities: false
for example we have such meta tags
<meta property="og:description" content="Die Meerestemperatur rund um das größte Korallenriff ist derzeit überdurchschnittlich hoch. Das setzt die Korallen unter „Stress“ und könnte zu einer Bleiche führen – der vierten in nur sechs Jahren.">
decodeEntities: true
would lead to
<meta property="og:description" content="Die Meerestemperatur rund um das größte Korallenriff ist derzeit überdurchschnittlich hoch. Das setzt die Korallen unter „Stress“ und könnte zu einer Bleiche führen – der vierten in nur sechs Jahren.">
Awesome guys, thanks!
I will publish a patch version with this fix :)
Released in 1.2.2. Thanks guys! 🚀
Hi everybody,
first of all big thanks for this high quality Nuxt module! We have integrated it successfully into a project where we build a news website.
Problem
Our SEO team has informed us, that since we enabled some Security features (CSP), our server side generated HTML contains altered encoding of certain special characters (e.g. German Umlauts like ä,ö,ü). So we have a difference in SSR HTML and the one which is in the browser.
I debugged a bit and found out that this module uses Cheerio HTML Parsing under the hood. The "problematic" part is https://github.com/Baroshem/nuxt-security/blob/2172055bd282e9b1b19fa033cbecffdd5d66a996/src/runtime/nitro/plugins/02a-preprocessHtml.ts#L22
If we change these options to include
decodeEntities: false
we get the previous behaviour back.Proposition
I understand that it might be a good idea to use the default settings and not change it for all the users. But we would like to include an option somehow to toggle/expose this setting.
Questions
decodeEntities: true
? Was it a conscious decision?If you need some more info let me know. I could also try to implement the change on my own if you give me some expecation about the config/API changes.
All the best, Robert ✌️