clear-code / xmlua

A user-friendly XML/HTML processing library for Lua based on libxml2
https://clear-code.github.io/xmlua/
MIT License
39 stars 10 forks source link

Failed to parse XML: internal. Error: Huge input lookup #18

Closed vadzimsharai closed 4 years ago

vadzimsharai commented 4 years ago

>>> xmlua.XML.parse(xml_doc)

image

Version: 1.1.4

kou commented 4 years ago

Could you provide a XML that reproduces this case?

vadzimsharai commented 4 years ago

You can use this python script for generating a similar XML file.
As I understand it, this happens when I use Unicode.

NODES_COUNT = 100000
CHUNK_SIZE = 1024*1024

data = '''
    <node attr="Attr">
        <title attr="Attr">Title</title>
        <description attr="Attr">%s</description>
    </node>
''' % (''.join(chr(1000+i) for i in range(300))) * NODES_COUNT

doc = '''<?xml version="1.0" encoding="UTF-8"?>
<root>
%s
</root>
''' % data

with open("example.xml", "w") as f:
    for i in range(0, len(doc), CHUNK_SIZE):
        f.write(doc[i:i + CHUNK_SIZE])  
vadzimsharai commented 4 years ago

My solution: add XML_PARSE_HUGE parse option to xml2.xmlCtxtReadMemory

image

kou commented 4 years ago

Thanks. I've added a feature to pass XML_PARSE_HUGE:

xmlua.XML.parse(xml, {parse_options = {"default", "huge"}})