PRImA-Research-Lab / prima-page-converter

Command line tool to convert page layout files to the latest PAGE XML format. It supports all previous versions of the PAGE format as well as ALTO XML, FineReader XML, and HOCR
Apache License 2.0
23 stars 6 forks source link

Error writing target PAGE XML file #9

Closed ghost closed 5 years ago

ghost commented 5 years ago

@chris1010010 When trying to convert an alto xml file to page-xml, I am getting an error of:

Error writing target PAGE XML file

I have validated my alto xml files, and it successfully did using:

wget https://raw.githubusercontent.com/kermitt2/pdfalto/master/schema/alto.xsd
xmllint --noout --schema alto.xsd output.xml
output.xml validates

My files attached test.zip

Waiting for your reply

chris1010010 commented 5 years ago

Hi, the problem is that there are some negative coordinates, which is not allowed in PAGE XML. Admittedly, the converter should change or remove those

ghost commented 5 years ago

@chris1010010 Thanks for the hint, I was able to solve it. But I am hoping that you integrate the negative coordinates removal option in the converter. Keep up the good work.