NaturalIntelligence / fast-xml-parser

Validate XML, Parse XML and Build XML rapidly without C/C++ based libraries and no callback.
https://naturalintelligence.github.io/fast-xml-parser/
MIT License
2.49k stars 302 forks source link

Add support for ignoring < inside stop nodes #499

Open nullcatalyst opened 1 year ago

nullcatalyst commented 1 year ago

This is useful for parsing

<script>if (a < b) {}</script>

since without it, the < is treated as a new tag. This causes the parser to throw an exception when it cannot find a corresponding > to close the tag

Purpose / Goal

Ideally I'd like to use this library to parse a small subset of HTML, and one of the problems I found with it is that it unfortunately doesn't handle < in script tags particularly well.

Currently it treats < as the start of an open tag, then looks for a corresponding > to close that tag. Sometimes this is a desirable feature to have -- like when parsing <pre> tags, the contents should still be valid HTML.

It's just <script> that is the odd ball here. This is also why it cannot simply be applied to all options.stopNodes, so a new options field had to be added. I went with options.ignoreTagsInNodes though I am not attached to that name at all, if you have a better one, please change it :)

Type

Please mention the type of PR

Bookmark this repository for further updates.

amitguptagwl commented 1 year ago

Let me understand your changes. I'll try to respond as soon as possible from my side.

amitguptagwl commented 1 year ago

Probably, we can rename checkNodePathMatch to isNodeInPaths.