UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
https://digi.bib.uni-mannheim.de/ocr-fileformat/
MIT License
176 stars 23 forks source link

installation problem under macOS 10.13.6 #88

Closed jtlz2 closed 5 years ago

jtlz2 commented 5 years ago

Thanks for the great tool.

Right now when I run sudo make install I get the following output:

(base) MacBook-Pro:ocr-fileformat$ sudo make install
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C vendor check
# download the dependencies
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C vendor all
mkdir -p xsd
# copy Alto XSD
cd xsd && ln -sf ../vendor/alto-schema/*/*.xsd . && \
        for xsd in *.xsd;do \
            target_xsd=`echo $xsd|sed 's/.//g'|sed 's/-/./'`; \
            if [ ! -e $target_xsd ];then \
                mv -f $xsd $target_xsd; \
            fi; done
# copy PAGE XSD
# copy ABBYY XSD
cd xsd && ln -sf ../vendor/abbyy-schema/*.xsd .
mkdir -p xslt
# symlink hocr<->alto as well as the language codes lookup xml
cd xslt && ln -sf ../vendor/hOCR-to-ALTO/hocr2alto2.0.xsl hocr__alto2.0.xsl
cd xslt && ln -sf ../vendor/hOCR-to-ALTO/hocr2alto2.1.xsl hocr__alto2.1.xsl
cd xslt && ln -sf ../vendor/hOCR-to-ALTO/alto2hocr.xsl alto2.0__hocr.xsl
cd xslt && ln -sf ../vendor/hOCR-to-ALTO/alto2hocr.xsl alto2.1__hocr.xsl
cd xslt && ln -sf ../vendor/hOCR-to-ALTO/hocr2text.xsl hocr__text.xsl
cd xslt && ln -sf ../vendor/hOCR-to-ALTO/alto2text.xsl alto__text.xsl
cd xslt && ln -sf ../vendor/hOCR-to-ALTO/codes_lookup.xml codes_lookup.xml
cd xslt && ln -sf ../vendor/format-converters/page2hocr.xsl page__hocr.xsl
cd xslt && ln -sf alto2.0__alto3.0.xsl alto2.0__alto3.1.xsl
cd xslt && ln -sf alto2.0__alto3.0.xsl alto2.1__alto3.0.xsl
cd xslt && ln -sf alto2.0__alto3.0.xsl alto2.1__alto3.1.xsl
mkdir -p /usr/local/share/ocr-fileformat
cp -r script xsd xslt vendor lib.sh /usr/local/share/ocr-fileformat
mkdir -p /usr/local/bin
sed '/^SHAREDIR=/c SHAREDIR="/usr/local/share/ocr-fileformat"' bin/ocr-transform.sh > /usr/local/bin/ocr-transform
sed: 1: "/^SHAREDIR=/c SHAREDIR= ...": command c expects \ followed by text
make: *** [install] Error 1

The Docker image runs fine however.

What am I doing wrong?

Thanks again

jtlz2 commented 5 years ago

My sed -i trick didn't work in fact

jtlz2 commented 5 years ago

I fixed this by installing GNU sed a la https://stackoverflow.com/a/30047931/1021819 - rather than using macOS's BSD flavour:

brew install gsed

Then changing the makefile sed -> gsed.

sudo make install then completes without error.

jtlz2 commented 5 years ago

I have also had to update the macOS default bash shell to make the scripts execute... (v3 -> v>=4):

See https://apple.stackexchange.com/a/292760/310129

Then hack ocr-validate, ocr-transform and lib.sh to point to /usr/bin/env bash rather than /bin/bash

Rationale:

https://stackoverflow.com/questions/6047648/bash-4-associative-arrays-error-declare-a-invalid-option

jtlz2 commented 5 years ago

Final hack:

brew install coreutils

Then change readlink to greadlink in ocr-validate

kba commented 5 years ago

Thanks for investigating. We do rely on bash >= 4 and coreutils. It wouldn't be too hard to adapt the installation to account for the slight differences between BSD/Mac OS and coreutils but I don't have an OSX machine to test it with.

jmechnich commented 5 years ago

From brew info coreutils:

==> Caveats Commands also provided by macOS have been installed with the prefix "g". If you need to use these commands with their normal names, you can add a "gnubin" directory to your PATH from your bashrc like: PATH="/usr/local/opt/coreutils/libexec/gnubin:$PATH"

This fixes sed, readlink, etc.

kba commented 5 years ago

Thanks. @jmechnich @jtlz2 If you want to document that in the README, I'd be happy to merge a PR.

jmechnich commented 5 years ago

Is there actually a good reason to use bash for those scripts? Maybe it would make sense to convert them to python as this is another dependency that ocr-fileformat has anyway (and is available on MacOS by default).

stweil commented 5 years ago

Python 3 would be fine for me.