capricorn86 / happy-dom

A JavaScript implementation of a web browser without its graphical user interface
MIT License
3.39k stars 204 forks source link

DOMParser does not recognise BODY and thus creates it twice #1615

Open OlaviSau opened 3 hours ago

OlaviSau commented 3 hours ago

Describe the bug DOMParser is not compliant with the browser implementation. This breaks frameworks like Angular from setting the innerHTML as an attribute, breaking most renders using Angular. This occurs because BODY is not recognised and this causes body to be created as an extra element in HappyDOM.

To Reproduce (new window.DOMParser()).parseFromString("<body><x></x>Example Text", "text/html").body.innerHTML HappyDOM: <body><x></x>Example Text</body> Expected behavior Chrome: <x></x>Example Text

Additional context tagName BODY should not be created as a standard node. https://github.com/capricorn86/happy-dom/blob/afd256b2e4f0260adb22432c1a354f558cda6623/packages/happy-dom/src/dom-parser/DOMParser.ts#L79 https://github.com/capricorn86/happy-dom/blob/afd256b2e4f0260adb22432c1a354f558cda6623/packages/happy-dom/src/dom-parser/DOMParser.ts#L87

@capricorn86

OlaviSau commented 2 hours ago

@capricorn86 I have created a naive fix for this, I did not handle all the cases - like what happens if there are multiple body elements as siblings / children of body and such. These cases will still fail and would need some rework on how HappyDOM appends / parses the html.

The sibling case could be handled by adding the following to the appendChild part of DOMParser, but that wouldn't solve the child case so perhaps it's better to do it in the XMLParser, but at that point it's not an XMLParser, but instead an HTMLParser as XML does not specify that it should not have multiple BODY tags. It should not be handled in appendChild as browsers allow appending multiple BODY tags into the html. Thus it is a bit complicated to solve in a way that doesn't compromise on performance and still remains correct.

If performance is not a concern then a naive solution could walk through the output of XMLParser root document tree and remove every child with the tagName body. If performance is a concern then either a HTMLParser could be created that ignores secondary body tags or a specification option could be added to the XMLParser. Personally I think creating an HTMLParser would make more sense since browsers implement many optimisations and odd rules for parsing HTML.

if (root[PropertySymbol.nodeArray][0]['tagName'] === "BODY") {
    root.removeChild(root[PropertySymbol.nodeArray][0]);
    continue;
}

https://github.com/capricorn86/happy-dom/pull/1616