RazrFalcon / xmlparser

A low-level, pull-based, zero-allocation XML 1.0 parser.
Apache License 2.0
130 stars 16 forks source link

Unknown token error when comment is above DOCTYPE #11

Closed sirkcion closed 4 years ago

sirkcion commented 4 years ago

Parsing the following results in the error "unknown token at 3:1"

<?xml version="1.0" encoding="UTF-8"?>
<!-- comment -->
<!DOCTYPE mydoc>
<a></a>

Above worked in v0.12.0, but now fails in v0.13.0. Probably due to Token detection rewrite.

Possible solution maybe to re-add parse_comment back into State::Start?

match state {
  State::Start => {
    if start == 0 && s.starts_with(b"<?xml ") {
      Some(Self::parse_declaration(s))
    } else if s.starts_with(b"<!DOCTYPE") {
      Some(Self::parse_doctype(s))
    } else if s.starts_with(b"<!--") {
      Some(Self::parse_comment(s))
    } else if s.starts_with_space() {
      s.skip_spaces();
      Self::parse_next_impl(s, state)
    } else {
      Self::parse_next_impl(s, State::AfterDtd)
    }
  }
  ...
}
RazrFalcon commented 4 years ago

This was already fixed in the master. I will publish it now.