RazrFalcon / xmlparser

A low-level, pull-based, zero-allocation XML 1.0 parser.
Apache License 2.0
130 stars 16 forks source link

unable to parse attribute with chevron #29

Closed jdrouet closed 1 month ago

jdrouet commented 1 month ago

I'm trying to parse an element attribute containing the value <%asm_group_unsubscribe_raw_url%> but it fails.

I tried in this repository by adding tests/integration/elements.rs

test!(
    attribute_08,
    "<c q:a=\"<%asm_group_unsubscribe_raw_url%>\"/>",
    Token::ElementStart("", "c", 0..2),
    Token::Attribute("q", "a", "<%asm_group_unsubscribe_raw_url%>", 3..44),
    Token::ElementEnd(ElementEnd::Empty, 44..46)
);

And I get the following

---- elements::attribute_08 stdout ----
thread 'elements::attribute_08' panicked at tests/integration/elements.rs:251:1:
assertion `left == right` failed
  left: Error("invalid attribute at 1:3 cause expected '\"' not '<' at 1:9")
 right: Attribute("q", "a", "<%asm_group_unsubscribe_raw_url%>", 3..44)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
RazrFalcon commented 1 month ago

XML doesn't allow < in attribute values.

jdrouet commented 1 month ago

Oh really?!

RazrFalcon commented 1 month ago

https://www.w3.org/TR/xml/#NT-AttValue

AttValue ::= '"' ([^<&"] | Reference)* '"'

Although the EntityValue production allows the definition of a general entity consisting of a single explicit < in the literal (e.g., <!ENTITY mylt "<">), it is strongly advised to avoid this practice since any reference to that entity will cause a well-formedness error.