OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
843 stars 213 forks source link

empty `<menuitem>`s misnest #96

Open mikesamuel opened 7 years ago

mikesamuel commented 7 years ago

wynne.jg reports

Sanitizing

<div contenteditable="true" contextmenu="MY_MENU" dir="ltr" draggable="true" dropzone="copy" hidden="" spellcheck="true" translate="yes">
  <menu id="MY_MENU" type="context"> 
    <menuitem label="Refresh"></menuitem> 
    <menuitem label="Twitter"></menuitem>
  </menu> LOTS OF TEXT HERE
</div>

yields the following

<div contenteditable="true" contextmenu="MY_MENU" dir="ltr" draggable="true" dropzone="copy" hidden="" spellcheck="true" translate="yes"> 
  <menu id="MY_MENU" type="context">
    <menuitem label="Refresh"> <menuitem label="Twitter"></menuitem></menuitem>
  </menu> LOTS OF TEXT HERE 
</div>

Note the <menuitem>s are nesting in the output but are siblings in the input.

pickle-weasle commented 7 years ago

A bit more information for this one. The tags seem to only get misnested if there is no text between the opening and closing tags. e.g. `

` will be misnested, however ` REFRESH
<menuitem label="Twitter">TWITTER</menuitem>`

will be nested fine.

From looking at some resources online, this element is aparently an empty element that has no permited content between it's opening and closing tags Also most examples I've found don't bother having any text between the tags. However, from playing around with the element it seems that browsers will use the text between the tags as a fallback if there is no label attribute specified. Ideally it'd be nice if the sanitizer could handle both ways