djspiewak / anti-xml

The scala.xml library has some very annoying issues. Time for a clean-room replacement!
http://anti-xml.org
Other
169 stars 35 forks source link

anti-xml and HTML5 #7

Closed kzys closed 13 years ago

kzys commented 13 years ago

Hi Daniel,

I want to use anti-xml on a HTML document. Then I've added a method to switch a parser. With my patch, we can use anti-xml with the Validator.nu HTML5 parser.

object HTML extends com.codecommit.antixml.SAXParser {
  import com.codecommit.antixml._
  override def fromInputSource(source: org.xml.sax.InputSource): Group[Elem] =
    fromInputSource(source, new nu.validator.htmlparser.sax.HtmlParser)
}
...
// http://www.w3.org/TR/html5/the-end.html#an-introduction-to-error-handling-and-strange-cases-in-the-parser
val doc = HTML.fromString("<p>1<b>2<i>3</b>4</i>5</p>")
djspiewak commented 13 years ago

Hmm, I'm not entirely comfortable with two aspects of this:

Instead of this, what if we made NodeSeqSAXHandler part of the public API? This would allow you to use it with an arbitrary parse interface which can produce SAX events.

A more specific approach would be to add a HTMLParser to the Anti-XML library. I'm not particularly averse to that, though it does mean some interesting dependency issues.

kzys commented 13 years ago

Thank you. If I can use NodeSeqSAXHandler as the public API, it's great for my purpose.

And I think, HTMLParser is too specific. Anti-XML's approach is very interesting. Then I want to open the possibility for other parsers.

djspiewak commented 13 years ago

Done in 30f4fce. Thanks for all the feedback!