PRImA-Research-Lab / prima-core-libs

Core libraries by the PRImA Research Lab
Apache License 2.0
16 stars 15 forks source link

reader/validation: throw informative exception #12

Open bertsky opened 3 years ago

bertsky commented 3 years ago

I sometimes have trouble debugging PAGE-XML documents that just won't open in PageViewer, despite the fact that they validate under the schema and there is no obvious mistake. The problem is that PageViewer won't tell you (except that when it outright crashes, you at least get a stack trace).

Now I digged into /PrimaDla/src/org/primaresearch/dla/page/io/xml/XmlPageReader.java and found that XmlPageReader.read() does have all the information in a PageErrorHandler instance called lastErrors. But this gets thrown away.

Why is this not piggy-backed on an exception which PageViewer's event listener can then react on?

For example, it would help seeing (at least on the console):

There is no ID/IDREF binding for IDREF 'region0015'
bertsky commented 1 year ago

Writing code for PAGE-XML that builds on prima-core-libs is difficult without informative error messages. As a user, I frequently get null even with files that validate perfectly under libxml2 (xmllint, xmlstarlet, Python lxml etc), which is not as strict as the parser used here.

Any news here @chris1010010?

stweil commented 8 months ago

@bertsky, please add an example document which does not open in PageViewer and which can be used to test your pull request.