dart-lang / html

Dart port of html5lib. For parsing HTML/HTML5 with Dart. Works in the client and on the server.
https://pub.dev/packages/html
Other
276 stars 58 forks source link

Better handling for unclosed tags #248

Open ricardoboss opened 3 months ago

ricardoboss commented 3 months ago

Hi there!

I am having trouble parsing some HTML not in my control that contains unclosed tags. An example:

<html>
<head>
    <title>Hello World</title>

    <link href="test.css">
</head>
<body>
    <h1>Hello World</h1>
</body>
</html>

As you can see, the <link> tag is not properly closed. This causes the parser to put everything after it inside (so, as a child within) the <link> tag and add a "shadow" </link></head><body></body></html> at the end.

This makes it impossible to traverse the DOM.

I'd like to have a way to configure how such cases are handled. Maybe by specifying which tags cannot contain content (auto close tags). Or maybe by changing a setting that causes the parser to automatically close tags once a parent tag has been closed.

Any help would be appreciated!