Open rillian opened 5 years ago
The problem occurs with general xml parsing failures. E.g. the unrecognized §
entity on line 776 of tlg0004.tlg001.perseus-eng1.xml
from canonical-greekLit.
Yes, this seems like something that would need work. The XML parsing vs. Capitains Parsing is something that has remained in the codebase for a long time. Feel free to propose a fix, including by creating a new exception :)
Some logging output got into my tei files, and hooktest asserts rather than reporting the error:
One may reproduce by prepending the string 'Garbage text\n' to e.g. the beginning of
tests/repo1/data/hafez/divan/hafez.divan.perseus-eng1.xml
.The
XMLSyntaxError
is hidden by theimap_unordered
call through the threadpool and presents instead as aMaybeEncodingError
becauselxml.etree
can't pickle its_ListErrorLog
. Flattening the parallel iterator to a serial one reveals the underlying issue.