MichalStrehovsky / iltrim

MIT License
9 stars 1 forks source link

Fix descriptor bug #110

Closed agocke closed 2 years ago

MichalStrehovsky commented 2 years ago

Thanks!

I thought XmlReader would be an okay parser, but it results in utterly confusing parsing experience.

Maybe the XPath (that is used for this in IL Linker) is better, but I didn't want to bring the heavy weight.

I understand why parsing of the XML in it's entirety needs to be difficult, but I still can't believe .NET doesn't have a simple/easy to use reader for a subset of XML that covers pretty much 95% of all use cases (Just tags, attributes, comments. No DTD, namespaces, or other garbage).

Even this XmlReader-based implementation already shows up as pretty expensive in profiles and we'll want to move this to the multi-threaded phase. XPath probably won't make that better.

Cc @vitek-karas and @sbomer for my rant :)

vitek-karas commented 2 years ago

XmlReader is actually pretty good for speed. 10 years ago when I was working on it, it was on par with most of the really fast native XML parsers.

It would be interesting to see which part is taking up time. The descriptor is ~50KB - which is not small, but it should be barely noticeable.

XPath is definitely slower - not too much on the CPU (probably only 2x or less), but it will have much worse memory footprint. XmlReader is streaming (basically no caching), XPath caches the entire document - it is relatively efficient at doing that, but still.

The XmlReader API can be a bit confusing - but I would not necessarily say that XPath is much better. All the really good XML apis are worse perf-wise. XLinq (which is probably the best we have) is definitely at least as bad as XPath, probably somewhat worse.