capricorn86 / happy-dom

A JavaScript implementation of a web browser without its graphical user interface
MIT License
3.39k stars 204 forks source link

Issue with tag order parsing XML document #282

Open Leprosy opened 3 years ago

Leprosy commented 3 years ago

Hi, I'm trying to parse this XML document:

doc = new DOMParser().parseFromString(`<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE start SYSTEM "http://xml.start.com/pub/start.dtd">
<start>
  <div style="--en-clipped-content:fullPage; --en-clipped-source-url:https://l3pro.netlify.app/; --en-clipped-source-title:https://l3pro.netlify.app/;">
<div><br></br></div><div style="; font-size: 16px; display:inline-block; min-width: 100%; position: relative;"> <span><div>
  <div>
    <h1 title="H1 " style="text-align:center;color:#006600;text-decoration:underline;"> This is a test </h1>
    <h2></h2>

    <p title="P " style="color:#660000;font-family:sans-serif;">This has some dynamic generated content</p>
    <p title="P " style="color:#660000;font-family:sans-serif;">This should be enought to test</p>
    <hr></hr>

    <p title="P " style="color:#660000;font-family:sans-serif;">Absolute paths</p>
    <img src="https://l3pro.netlify.app/celes.jpeg" width="50" height="100"></img>
    <img src="https://l3pro.netlify.app/img/guitar.jpeg" width="50" height="100"></img>
        <img src="https://l3pro.netlify.app/img/mk2.jpg" width="50" height="100"></img>
            <img src="https://l3pro.netlify.app/img/yo.png" width="50" height="100"></img>

<p title="P " style="color:#660000;font-family:sans-serif;">Relative paths</p>
    <img src="celes.jpeg" width="50" height="100"></img>
    <img src="img/guitar.jpeg" width="50" height="100"></img>
    <img src="img/mk2.jpg" width="50" height="100"></img>
                <img src="img/yo.png" width="50" height="100"></img>

</div></div></span></div>
</div>
</start>`, 'application/xml')

But this is obtained as a result:

>>> doc.firstElementChild.innerHTML

"<head></head><body><!DOCTYPE start SYSTEM \"http://xml.start.com/pub/enml2.dtd\"><start>\n  <div style=\"--en-clipped-content:fullPage; --en-clipped-source-url:https://l3pro.netlify.app/; --en-clipped-source-title:https://l3pro.netlify.app/;\">\n<div style=\"; font-size: 16px; display:inline-block; min-width: 100%; position: relative;\"> <span><div>\n  <div>\n    <h1 title=\"H1 \" style=\"text-align:center;color:#006600;text-decoration:underline;\"> This is a test </h1>\n    <h2></h2>\n\n    <p title=\"P \" style=\"color:#660000;font-family:sans-serif;\">This has some dynamic generated content</p>\n    <p title=\"P \" style=\"color:#660000;font-family:sans-serif;\">This should be enought to test</p>\n    <hr/></div>\n\n    <p title=\"P \" style=\"color:#660000;font-family:sans-serif;\">Absolute paths</p>\n    <img src=\"https://l3pro.netlify.app/celes.jpeg\" width=\"50\" height=\"100\"/></div>\n    <img src=\"https://l3pro.netlify.app/img/guitar.jpeg\" width=\"50\" height=\"100\"/></span>\n        <img src=\"https://l3pro.netlify.app/img/mk2.jpg\" width=\"50\" height=\"100\"/></div>\n            <img src=\"https://l3pro.netlify.app/img/yo.png\" width=\"50\" height=\"100\"/></div>\n\n<p title=\"P \" style=\"color:#660000;font-family:sans-serif;\">Relative paths</p>\n    <img src=\"celes.jpeg\" width=\"50\" height=\"100\"/></start>\n    <img src=\"img/guitar.jpeg\" width=\"50\" height=\"100\"/>\n    <img src=\"img/mk2.jpg\" width=\"50\" height=\"100\"/>\n                <img src=\"img/yo.png\" width=\"50\" height=\"100\"/>\n\n\n    \n  \n\n\n\n</body>"

The main issue is that the <start> and <span> tags are getting closed in the wrong position. I'm comparing the parsed document with another using google chrome DOMParser implementation and the result got parsed correctly. Is there some option/configuration/parameter I'm missing here?

capricorn86 commented 3 years ago

Thank you for reporting @Leprosy! :slightly_smiling_face:

I believe the issue is caused by Happy DOM not having support for "application/xml". It will fallback on treating the code as HTML.

I will look into adding support for "application/xml".

Leprosy commented 3 years ago

Glad to be of help! As I said before, excellent library!

Pyrolistical commented 2 years ago

Ran into this when using new DOMParser().parseFromString(file, "image/svg+xml")

Would be a better user experience if parseFromString threw an error for not implemented mime types.

Sadly the workaround to parse svg is to use jsdom.

import jsdom from "jsdom";
const {
  window: { DOMParser },
} = new jsdom.JSDOM();
stevebeauge commented 1 month ago

Any update regarding this 3 years issue ?

thanks