dlang-community / experimental.xml

A replacement of Phobos std.xml
https://rawgit.com/dlang-community/experimental.xml/gh-pages/index.html
Boost Software License 1.0
11 stars 8 forks source link

What about html(5?) support? #10

Open wilzbach opened 6 years ago

wilzbach commented 6 years ago

From @trikko on August 1, 2016 7:28

I wonder if it is too difficult to support also html 5. IMO it would be a good idea for web-related applications.

Copied from original issue: lodo1995/experimental.xml#28

wilzbach commented 6 years ago

From @Hackerpilot on August 1, 2016 8:15

HTML is not XML. I don't think this is a reasonable feature request.

For further information about the madness that HTML supports, check out the spec here: https://www.w3.org/TR/html5/syntax.html#tree-construction. Note the gigantic state machine specified for parsing malformed tags.

wilzbach commented 6 years ago

From @Hackerpilot on August 1, 2016 8:17

Of course if your HTML input also happens to be XHTML, then there shouldn't be a problem.

wilzbach commented 6 years ago

From @trikko on August 1, 2016 8:17

I know it's not the same. But maybe at least XHTML 5 could be interesting.

wilzbach commented 6 years ago

From @trikko on August 1, 2016 8:21

(anyway: I don't care too much about parsing malformed html and fixing it. It would be interesting to have dom-related function for tree manipulation and output valid html5)

wilzbach commented 6 years ago

From @rjmcguire on August 1, 2016 8:54

On Mon, Aug 1, 2016 at 10:21 AM, Andrea Fontana notifications@github.com wrote:

(anyway: I don't care too much about parsing malformed html and fixing it. It would be interesting to have dom-related function for tree manipulation and output valid html5)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lodo1995/experimental.xml/issues/28#issuecomment-236518148, or mute the thread https://github.com/notifications/unsubscribe-auth/ABU8CWLeIdIsjOyesHN-ncxPxfkg27WZks5qbayPgaJpZM4JZURI .

+1, I think it would be irresponsible to allow the definitive standard xml parser to fix dodgy html / xml. There are tools for that. How hard it is to do html5 parsing / output with the standard library will be important to validate during experimental phase of this library though.

wilzbach commented 6 years ago

From @lodo1995 on August 1, 2016 12:15

@trikko as @Hackerpilot said, it's not possible to parse all HTML with an XML parser. The idea is to keep the components of the library as independent and generic as possible. So, for example, the parser and cursor do not check for correct element nesting. The parser doesn't even need to parse attributes. So this library already provides some building blocks to parse HTML. If your HTML happens to be XHTML, then you can even use this library to build a DOM. You can use the provided DOM implementation, which will have full Level 3 support. Or you can create a custom DOM hierarchy with advanced HTML/SVG/whatever-you-need support, basing it on the provided one, and then have the provided DOMBuilder build it.