NaturalIntelligence / fast-xml-parser

Validate XML, Parse XML and Build XML rapidly without C/C++ based libraries and no callback.
https://naturalintelligence.github.io/fast-xml-parser/
MIT License
2.45k stars 296 forks source link

Some character reference formats not parsed correctly #568

Open Pharb opened 1 year ago

Pharb commented 1 year ago

Description

According to this [XML spec](https://www.w3.org/TR/xml/#dt-charref), character references can include leading zeros and lower-case letters in hexadecimal representation. Personally I encountered this parsing issue by consuming XML provided by a proprietary third-party tool, which includes references in the style of `<`. ### Input
<?xml version="1.0"?>
<tests>
  <test>&lt;</test>
  <test>&#60;</test>
  <test>&#060;</test>
  <test>&#0060;</test>
  <test>&#x3C;</test>
  <test>&#x03C;</test>
  <test>&#x003C;</test>
  <test>&#x3c;</test>
  <test>&#x03c;</test>
  <test>&#x003c;</test>
</tests>

Code

const parser = new XMLParser();
let result = parser.parse(xmlData, true);

Output

{
    "?xml": "",
    "tests": {
        "test": [
            "<",
            "<",
            "&#060;",
            "&#0060;",
            "<",
            "&#x03C;",
            "&#x003C;",
            "&#x3c;",
            "&#x03c;",
            "&#x003c;"
        ]
    }
}

expected data

{
    "?xml": "",
    "tests": {
        "test": [
            "<",
            "<",
            "<",
            "<",
            "<",
            "<",
            "<",
            "<",
            "<",
            "<",
        ]
    }
}

Would you like to work on this issue?

Bookmark this repository for further updates. Visit SoloThought to know about recent features.

github-actions[bot] commented 1 year ago

We're glad you find this project helpful. We'll try to address this issue ASAP. You can vist https://solothought.com to know recent features. Don't forget to star this repo.