RazrFalcon / xmlparser

A low-level, pull-based, zero-allocation XML 1.0 parser.
Apache License 2.0
130 stars 16 forks source link

Access to current depth and text position #24

Closed gwenn closed 1 year ago

gwenn commented 1 year ago

I am trying to implement an API similar to libxml2-xmlreader / xmltextreader / XMLStreamReader using xmlparser::Tokenizer. But I missing access to current depth: https://github.com/RazrFalcon/xmlparser/blob/7334c08aabd21463507a77a288924ea8587865e4/src/lib.rs#L338 And text position (Tokenizer's stream.gen_text_pos()) for error report. Would you mind giving access to these two states ? Thanks

RazrFalcon commented 1 year ago

depth and stream position are internal for a reason. You're not suppose to use/rely on then. You can keep track of depth on your side and each token already has a position.

gwenn commented 1 year ago

libxml2 , csharp and java give access to current depth.

For text position, I guess I can copy paste https://github.com/RazrFalcon/roxmltree/blob/adec50c7361cf1dfbbea03bc928d1d20728f230d/src/lib.rs#L177-L180

RazrFalcon commented 1 year ago

libxml2 , csharp and java give access to current depth.

And? Those libraries simply have a different API to xmlparser.

Why would you need depth and position to begin with? It doesn't make much sense when working with XML entities. You can see how roxmltree handles it.

gwenn commented 1 year ago

Depth is used when you need to match an ElementEnd with an ElementStart. For example, to implement xmlTextReaderNext / skip / skipElement Or ReadSubtree.

Current draft (implemented directly in xmlparser crate): https://github.com/gwenn/xmlparser/blob/1da8b12129b1dd433c78159b3eb9b74dcc6f0837/src/reader.rs#L216-L218

And text position (Tokenizer's stream.gen_text_pos()) for error report.

(Like roxmltree)

RazrFalcon commented 1 year ago

Depth is used when you need to match an ElementEnd with an ElementStart.

You should count this on your side, because entity references could introduce elements as well. The depth xmlparser stores is not what you need/want. It shouldn't really be there to begin with.

RazrFalcon commented 1 year ago

Simply treat xmlparser as a low-level tokenizer. It's not a complete "parser", but a foundation for a parser. So it's perfectly normal to have some wrapper code on your side.

https://github.com/RazrFalcon/xmlparser#why-a-new-library

gwenn commented 1 year ago

Ok, I manage to reimplement them on my side (by duplicating Token::span()): https://github.com/gwenn/xmlreader/blob/5248ccd05badd0dd414546b526d91c34ec5e18e7/src/lib.rs#L117-L154