Masterminds / html5-php

An HTML5 parser and serializer for PHP.
http://masterminds.github.io/html5-php/
Other
1.56k stars 115 forks source link

Retain closing br tag as though it were a normal br tag #185

Closed IMSoP closed 4 years ago

IMSoP commented 4 years ago

The WHATWG spec includes a special rule for handling </br>, in the section on parsing when "in body":

An end tag whose tag name is "br" Parse error. Drop the attributes from the token, and act as described in the next entry; i.e. act as if this was a "br" start tag token with no attributes, rather than the end tag token that it actually is.

The result is that invalid HTML like Hello <br>World</br>! will be rendered by browsers as though it had two linebreaks, Hello <br>World<br>!. This library currently (quite reasonably!) removes the erroneous end tag instead, giving Hello <br>World!

I'm using this library for processing some messy HTML, and it would be useful to have this rule match the spec / browser behaviour.

goetas commented 4 years ago

if that is the browsers behavior, I think that we should do the same.

Are you willing to submit a patch for this?

IMSoP commented 4 years ago

I'll give it a go; by the looks of it, it will just need a special case at the top of DOMTreeBuilder::endTag