Closed ArneDefauw closed 3 years ago
Thanks for reporting! I added an option to load_from_xmi
that you can use, see https://github.com/dkpro/dkpro-cassis#large-xmi-files . It is in master and will be in the next release.
I released a new version with the fix.
Describe the bug Can not deserialize a large xmi file created using UIMA.
To Reproduce Steps to reproduce the behavior:
When trying to deserialize the following xmi file https://drive.google.com/file/d/1WZS3Ep67O7BluLBd4NANrQXKanmNakYk/view?usp=sharing With Typesystem: https://drive.google.com/file/d/1hJVC9wepQAoYhMteEaXMPFnhQX2OZU0I/view?usp=sharing
Via:
I get following error message:
_File "/miniconda/lib/python3.7/site-packages/cassis/xmi.py", line 42, in load_cas_from_xmi return deserializer.deserialize(source, typesystem=typesystem, lenient=lenient)
File "/miniconda/lib/python3.7/site-packages/cassis/xmi.py", line 75, in deserialize for event, elem in context:
File "src/lxml/iterparse.pxi", line 209, in lxml.etree.iterparse.next
File "src/lxml/iterparse.pxi", line 194, in lxml.etree.iterparse.next
File "src/lxml/iterparse.pxi", line 229, in lxml.etree.iterparse._read_more_events
File "src/lxml/parser.pxi", line 1384, in lxml.etree._FeedParser.feed
File "src/lxml/parser.pxi", line 606, in lxml.etree._ParserContext._handleParseResult
File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
File ".../large.xmi", line 85622 <cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView" mimeType="text" sofaString="<div id="text" class="panel-body"> <div id="textTabContent"> <div id=&q
...
XMLSyntaxError: internal error: Huge input lookup, line 85622, column 5_
The error is probably caused by https://stackoverflow.com/questions/48984325/lxml-etree-xmlsyntaxerror-internal-error-huge-input-lookup , https://stackoverflow.com/questions/11850345/using-python-lxml-etree-for-huge-xml-files
and line 69 in https://github.com/dkpro/dkpro-cassis/blob/master/cassis/xmi.py