UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
https://digi.bib.uni-mannheim.de/ocr-fileformat/
MIT License
181 stars 22 forks source link

Wrong transformation, validation choices #49

Closed zuphilip closed 7 years ago

zuphilip commented 7 years ago

As a result of https://github.com/UB-Mannheim/ocr-fileformat/pull/48 a new (wrong) option for transformation was found:

ocr-transforms-codes

The relevant lines which should exclude that can be found in lib.sh.

However, it seems that the permission filtering is not working as expected because the symlink itself has another permission than the file linked:

bash-4.3# stat xslt/codes_lookup.xml
  File: 'xslt/codes_lookup.xml' -> '../vendor/hOCR-to-ALTO/codes_lookup.xml'
  Size: 39              Blocks: 0          IO Block: 4096   symbolic link
Device: 21h/33d Inode: 199         Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)

==> We should use find -L to follow the symlinks instead.