flavorjones / loofah

Ruby library for HTML/XML transformation and sanitization
MIT License
934 stars 138 forks source link

Allow boolean and empty attributes for certain node types #278

Open dedene opened 12 months ago

dedene commented 12 months ago

We are using Loofah in a number of projects where the scrubbing of empty attributes of boolean attributes became an issue. This PR adds support for boolean attributes or empty string values on certain node types. It fixes #242.

I.e. <option value="">Empty Value</option> is a perfectly safe html, but the empty value was stripped when using the scrubber.

It also adds support for boolean attributes (i.e. download on an <a> element, or autoplay on a <video> tag. I could not get Nokogiri to output it as a boolean attributes, but the html5 specification (section 3.2.2) specifies that empty string is also fine.

# Before this PR:
>> Loofah.html5_fragment('<option value="" selected></selected>').scrub!(:strip).to_s
=> "<option></option>"

# After this PR:
>> Loofah.html5_fragment('<option value="" selected></selected>').scrub!(:strip).to_s
=> "<option value=\"\" selected=\"\"></option>"

The behaviour from https://github.com/flavorjones/loofah/pull/51 is still the same, so the risk for unwanted regressions is minimal imho.

The tests on Github Action seem to fail for truffleruby. But that seems to be related to https://github.com/ruby/stringio/pull/71 which just got merged and not related to the actual code changes in this PR.

Feel free to make or suggest changes if needed. Thanks a lot for having a look at this!

dedene commented 12 months ago

(the force-push: I've squashed my changes up till now in a single commit)

flavorjones commented 12 months ago

Thanks for submitting this! It may be a day or two before I'm able to review.

flavorjones commented 10 months ago

@dedene Thank you for your patience!

So I'm proceeding carefully here for the moment, since Rails::HTML::Sanitizer is sensitive to empty/boolean attributes. See https://github.com/rails/rails-html-sanitizer/pull/136 for the original description of the general problem back in June 2022.

I had been waiting for HTML5 parsing to land in the sanitizer stack before tackling some of these behavioral edge cases. This PR might be the right answer, but I want to try to see if we can get the underlying parser (libgumbo) to do the right thing here first.

All of which is to say: I'm going to play with this for a bit.