NaturalIntelligence / fast-xml-parser

Validate XML, Parse XML and Build XML rapidly without C/C++ based libraries and no callback.
https://naturalintelligence.github.io/fast-xml-parser/
MIT License
2.43k stars 297 forks source link

Add support for parsing HTML numeric entities #645

Closed DerZade closed 3 months ago

DerZade commented 4 months ago

Purpose / Goal

This PR adds a htmlNumericEntities option which adds support for parsing HTML numeric entities.

Type

Please mention the type of PR

Benchmarks

Before

Running Suite: XML Parser benchmark
fxp v3 : 43885.453114704054 requests/second
fxp : 28444.884306095257 requests/second
fxp - preserve order : 35582.44370634719 requests/second
xmlbuilder2 : 12279.374624117605 requests/second
xml2js  : 17997.982194339696 requests/second

After

Running Suite: XML Parser benchmark
fxp v3 : 56829.18072321323 requests/second
fxp : 35711.03397186936 requests/second
fxp - preserve order : 35048.53534051172 requests/second
xmlbuilder2 : 12565.714384084644 requests/second
xml2js  : 17898.357168570532 requests/second
amitguptagwl commented 4 months ago

Thanks for your PR

I've a few suggestions

  1. I believe we don't need another option. It can be done with htmlEntities flag only.
  2. we can add an extra property to htmlEntities object, say num or hex. And the value can be set to a function. In the loop, we can check the type if it is string replace it as today. Otherwise, run the function as per setting in the object. It'll give the chance to disable it easily.
  3. both expressions (Eg /&#([0-9]+);/) has + that make it unsafe to match very long string and will impact performance too.

Let me know your thoughts

DerZade commented 4 months ago

@amitguptagwl I just force pushed and updated my code to reflect your suggestions:

coveralls commented 3 months ago

Coverage Status

coverage: 98.257% (+0.006%) from 98.251% when pulling b2f3c965d094b50ac0d05d67483489100afa2514 on DerZade:master into 072b2b0c148ae2fcb087f08d740382b9897f81cf on NaturalIntelligence:master.

amitguptagwl commented 3 months ago

It's live now. You can see the entry in change logs

DerZade commented 3 months ago

Thank you so much!