flavorjones / loofah

Ruby library for HTML/XML transformation and sanitization
MIT License
934 stars 137 forks source link

Built-in scrubbers don't escape unsafe HTML with Nokogiri > 1.15 #276

Closed stefannibrasil closed 11 months ago

stefannibrasil commented 1 year ago

Not sure if this is the right place to file this bug or if it should be on Nokigiri. Let me know if it doesn't belong here :)

I use the a combination of built-in + custom scrubbers (prune, noopener, nofollow, unprintable). dependabot opened a PR this week on the app using that scrubber to bump nokogiri from 1.14.3 to 1.15.4. We started getting an error because our test caught a test that was failing due to its unsafe tag not being escaped.

Here is an example of a test with Nokogiri 1.15.0:

        context ":escape" do
          it "escape bad tags" do
            doc = klass.parse("<html><body><a href=\"http://www.owasp.org?test=$varUnsafe\">link</a></body></html>")
            result = doc.scrub!(:escape)

            assert_equal "<a href=\"http://www.owasp.org?test=%24varUnsafe\">link</a>", result.xpath("/html/body").inner_html
          end
        end

Result:


  2) Failure:
Loofah::HTML4::Document::#scrub!:::escape#test_0001_escape bad tags [/Users/stefannibrasil/loofah/test/integration/test_scrubbers.rb:142]:
--- expected
+++ actual
@@ -1 +1 @@
-"<a href=\"http://www.owasp.org?test=%24varUnsafe\">link</a>"
+"<a href=\"http://www.owasp.org?test=$varUnsafe\">link</a>"

I believe this is a bug because we want to escape the variable safely. I've tested it with nokogiri 1.13 and 1.14. This bug happens when setting Nokogiri version to >= 1.15.

Not sure what the fix is. If there is anything I can help with, please let me know. Thanks!

flavorjones commented 12 months ago

Hi @stefannibrasil! This looks like a change upstream in libxml2

The maintainer calls out that $, [, and ] should not have been escaped, and are no longer as of libxml2 v2.11.0, which shipped in Nokogiri v1.15.0.

Unfortunately, since this is libxml2's behavior, it's not easy for nokogiri or loofah to change that behavior.

Can you help me understand why you think escaping that character is useful for security/sanitization?

stefannibrasil commented 11 months ago

Ahhh, thank you very much for the context, that's really helpful. So many things to learn 📚

Can you help me understand why you think escaping that character is useful for security/sanitization?

This was totally a misunderstanding on my end. I was reading OWASP cheat sheet where it mentions:

“HTML Context” refers to inserting a variable between two basic HTML tags like a <div> or <b>. For example..
<div> $varUnsafe </div>
An attacker could modify data that is rendered as $varUnsafe. This could lead to an attack being added to a webpage.. for example.

<div> <script>alert`1`</script> </div> // Example Attack
In order to add a variable to a HTML context safely, use HTML entity encoding for that variable as you add it to a web template.

However, the sanitized HTML would strip out the script tag anyway, so <div> $varUnsafe </div> would not be a security risk. Only if used solely without any previous sanitation.

I had a test that asserted characters with $ would be escaped but reading the docs you shared, I realize now it's not a security risk. Thank you so much for sharing. I will close this issue and move on with upgrading Nokogiri's 👀