PRImA-Research-Lab / prima-page-converter

Command line tool to convert page layout files to the latest PAGE XML format. It supports all previous versions of the PAGE format as well as ALTO XML, FineReader XML, and HOCR
Apache License 2.0
23 stars 6 forks source link

negative input points #10

Closed ghost closed 4 years ago

ghost commented 4 years ago

@chris1010010 At times, the input Alto file might contain negative input points like this ="-90 in that case, the converter fails.

Solution, add an option to deal with the negative points.

chris1010010 commented 4 years ago

Hi, which option would be most useful in your case? Remove objects with negative coordinates? Try to cut off at 0? PAGE forbids negative coordinates, so some form of correction is needed.

ghost commented 4 years ago

hmmmmm.... can you for now,

chris1010010 commented 4 years ago

Added -neg-coords command line option with two modes: removeObj (skip object with negative point(s)), toZero (change negative values to zero)

ghost commented 4 years ago

Thanks man