flavorjones / loofah

Ruby library for HTML/XML transformation and sanitization
MIT License
934 stars 137 forks source link

`#text` should only render HTML elements #221

Closed weiqingtoh closed 2 years ago

weiqingtoh commented 2 years ago

Not sure if this is an issue, but a question here - is it intended for #text to show text within non visible HTML tags? It seems like we are still displaying for <style> tags as well as HTML comment tags <!-- Some text>.

Thank you!

Loofah.fragment("<style> some css</style><p> some text</p>").text
=> " some css some text"
 Loofah.fragment("<style> some css</style> <!-- some HTML comment --><p> some text</p>").text
=> " some css  some HTML comment  some text"
flavorjones commented 2 years ago

@weiqingtoh Thanks for asking this question! I think this is a bug in Loofah's implementation of Node#text:

html_frag = "<style> some css</style> <!-- some HTML comment --><p> some text</p>"

Nokogiri::HTML::DocumentFragment.parse(html_frag).text
=> " some css  some text"

Loofah.fragment(html_frag).text
=> " some css  some HTML comment  some text"

Nokogiri uses a libxml2 function to serialize. Loofah overrides this method but is including all children, not just visible elements.

I'll work on a fix!