PRImA-Research-Lab / prima-page-viewer

Java based viewer for PAGE XML files (layout + text content). Also supports ALTO XML, FineReader XML, and HOCR.
Apache License 2.0
34 stars 9 forks source link

Add a command line flag / configuration option to set base dir for resolving relative paths. #6

Closed kba closed 4 years ago

kba commented 5 years ago

Rationale

There are attributes in the PAGE XML page content schema that point to file names of images, notably pc:Page/@imageFilename and pc:AlternativeImage/@filename. When these paths are relative, tools like PAGE Viewer and Aletheia currently resolve them relative to the PAGE document. This only works if the referenced images are actually stored there. If tools could be configured to resolve relative paths relative to a different directory in the file system, image and document storage could be decoupled.

Proposal

Tools working with PAGE XML should have the concept of resolve-dir, the base folder to resolve relative filenames. By default resolve-dir is the directory containing the PAGE XML document, if it is stored locally, otherwise /.

Simplest user interface would be a command line flag --resolve-dir:

java -jar JPageViewer.jar -- --resolve-dir /path/to/images
chris1010010 commented 5 years ago

Should work now as proposed

mrocr commented 5 years ago

yup :tada: @chris1010010 release latest build

chris1010010 commented 5 years ago

Released