commoncrawl / ia-web-commons

Web archiving utility library
Apache License 2.0
9 stars 6 forks source link

WAT: only unescape complete XML/HTML character entities (fixes #19) #20

Closed sebastian-nagel closed 4 years ago

sebastian-nagel commented 4 years ago
sebastian-nagel commented 4 years ago

After running more tests: use jsoup's Parser.unescapeEntities(...) instead which