OCR-D / ocrd_keraslm

Simple character-based language model using keras
Apache License 2.0
7 stars 6 forks source link

AttributeError: 'KerasRate' object has no attribute 'rater' #21

Closed jbarth-ubhd closed 7 months ago

jbarth-ubhd commented 7 months ago

In docker, ocrd/all:maximum

jb@pers16:~/workspace/ocrd-keras> date
Fr 22. Mär 10:37:52 CET 2024

jb@pers16:~/workspace/ocrd-keras> docker pull ocrd/all:maximum
maximum: Pulling from ocrd/all
Digest: sha256:f0321d84bdb293294e6a36efb5d4addca8acf305ee218a4860a54763d9f253d2
Status: Image is up to date for ocrd/all:maximum
docker.io/ocrd/all:maximum

jb@pers16:~/workspace/ocrd-keras> cat run.sh
#!/bin/bash
set -x
set -e
# docker-ocrd ocrd-import .
# docker-ocrd ocrd-tesserocr-recognize -I OCR-D-IMG -O OCR-D-OCR -P segmentation
►_level region -P textequiv_level word -P model deu
docker-ocrd ocrd-keraslm-rate -I OCR-D-OCR -O OCR-D-KERAS -P model_file /home/jb
►/ocrd-models/ocrd-keraslm-rate/model_dta_full.h5 -P textequiv_level word -P
► alternative_decoding false

jb@pers16:~/workspace/ocrd-keras> ./run.sh
+ docker-ocrd ocrd-keraslm-rate -I OCR-D-OCR -O OCR-D-KERAS -P model_file /home/
►jb/ocrd-models/ocrd-keraslm-rate/model_dta_full.h5 -P textequiv_level word -P
► alternative_decoding false
09:36:11.559 INFO processor.KerasRate - INPUT FILE 0 / p0002
09:36:11.636 INFO processor.KerasRate - Scoring text in page 'OCR-D-OCR_test-
►fouche10_5' at the word level
09:36:11.637 INFO ocrd.page_validator.validate - Validating input file 'OCR-D-
►OCR_test-fouche10_5'
09:36:11.870 INFO processor.KerasRate - Rating 1003 elements with a total of
► 3383 characters
09:36:11.870 ERROR ocrd.processor.helpers.run_processor - Failure in processor '
►ocrd-keraslm-rate'
Traceback (most recent call last):
  File "/build/core/src/ocrd/processor/helpers.py", line 130, in run_processor
    processor.process()
  File "/build/ocrd_keraslm/ocrd_keraslm/wrapper/rate.py", line 110, in process
    confidences = self.rater.rate(textstring, context) # much faster
AttributeError: 'KerasRate' object has no attribute 'rater'
Traceback (most recent call last):
  File "/usr/local/sub-venv/headless-tf1/bin/ocrd-keraslm-rate", line 33, in <
►module>
    sys.exit(load_entry_point('ocrd-keraslm', 'console_scripts', 'ocrd-keraslm-
►rate')())
  File "/usr/local/sub-venv/headless-tf1/lib/python3.8/site-packages/click/core.
►py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/sub-venv/headless-tf1/lib/python3.8/site-packages/click/core.
►py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/sub-venv/headless-tf1/lib/python3.8/site-packages/click/core.
►py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/sub-venv/headless-tf1/lib/python3.8/site-packages/click/core.
►py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/build/ocrd_keraslm/ocrd_keraslm/wrapper/cli.py", line 9, in ocrd_
►keraslm_rate
    return ocrd_cli_wrap_processor(KerasRate, *args, **kwargs)
  File "/build/core/src/ocrd/decorators/__init__.py", line 133, in ocrd_cli_wrap
►_processor
    run_processor(processorClass, mets_url=mets, workspace=workspace, **kwargs)
  File "/build/core/src/ocrd/processor/helpers.py", line 133, in run_processor
    raise err
  File "/build/core/src/ocrd/processor/helpers.py", line 130, in run_processor
    processor.process()
  File "/build/ocrd_keraslm/ocrd_keraslm/wrapper/rate.py", line 110, in process
    confidences = self.rater.rate(textstring, context) # much faster
AttributeError: 'KerasRate' object has no attribute 'rater'
bertsky commented 7 months ago

Oops! The CI did not catch this, since it instantiates the processor differently. This broke recently when OCR-D/core changed the Processor initialization in run_processor (get_processor, then assigning workspace post-hoc).

I had already adapted this locally but did not push yet. Can you please try again with current master?

I.e. in ocrd/all, just do:

git -C /build/ocrd_keraslm checkout master
git -C /build/ocrd_keraslm pull origin master
make -C /build -W ocrd_keraslm ocrd-keraslm-rate
jbarth-ubhd commented 7 months ago

tried building ocrd, but ... uh dependency hell:

In file included from /home/jb/ocrd_all/ocrd_olena/repo/olena/milena/mln/io/
►magick/all.hh:44,
                 from /home/jb/ocrd_all/ocrd_olena/repo/olena/scribo/src/
►binarization/global_threshold.cc:29:
/home/jb/ocrd_all/ocrd_olena/repo/olena/milena/mln/io/magick/load.hh: In
► function ‘void mln::io::magick::load(mln::Image<I>&, const string&)’:
/home/jb/ocrd_all/ocrd_olena/repo/olena/milena/mln/io/magick/load.hh:191:10:
► error: ‘PixelPacket’ is not a member of ‘Magick’; did you mean ‘MagickCore::
►PixelPacket’?
  191 |  Magick::PixelPacket* pixels = view.get(0, 0, ima.ncols(), ima.nrows());
      |          ^~~~~~~~~~~
In file included from /usr/local/include/ImageMagick-7/MagickCore/stream.h:25,

...
bertsky commented 7 months ago

Wait, that looks like a native build of ocrd_all from scratch – I thought you were in Docker?

For a native installation, just follow the Setup Guide, with the difference that you need the ocrd_keraslm update:

cd ocrd_all
git pull
make modules
sudo make deps-ubuntu
git -C ocrd_keraslm checkout master
git -C ocrd_keraslm pull origin master
make all NO_UPDATE=1

Or, if you already had the other ocrd_all modules, just do the equivalent of the above Docker recipe:

git -C ocrd_keraslm checkout master
git -C ocrd_keraslm pull origin master
make -W ocrd_keraslm ocrd-keraslm-rate NO_UPDATE=1
jbarth-ubhd commented 7 months ago

Just saw git and tought from source

jbarth-ubhd commented 7 months ago

Let's try again...

jb@pers16:~> docker-ocrd git -C /build/ocrd_keraslm checkout master
Previous HEAD position was 472197f update assets
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
jb@pers16:~> docker-ocrd git -C /build/ocrd_keraslm pull origin master
From https://github.com/OCR-D/ocrd_keraslm
 * branch            master     -> FETCH_HEAD
   b996c82..ea79b2a  master     -> origin/master
Updating 472197f..ea79b2a
Fast-forward
 .circleci/config.yml                |  10 +-
 CHANGELOG.md                        |  12 +++
 Makefile                            |  14 ++-
 README.md                           | 183 +++++++++++++++++++++++++-----------
 ocrd_keraslm/lib/rating.py          |  56 ++++++++---
 ocrd_keraslm/scripts/run.py         |  51 +++++++---
 ocrd_keraslm/wrapper/ocrd-tool.json |  12 ++-
 ocrd_keraslm/wrapper/rate.py        |  89 +++++++++++-------
 setup.py                            |   1 +
 test/test_wrapper.py                |   9 +-
 10 files changed, 308 insertions(+), 129 deletions(-)
jb@pers16:~> docker-ocrd make -C /build -W ocrd_keraslm ocrd-keraslm-rate
make: Entering directory '/build'
make -o ocrd_keraslm ocrd-keraslm-rate keraslm-rate VIRTUAL_ENV=/usr/local/sub-venv/headless-tf1
make[1]: Entering directory '/build'
make[1]: Nothing to be done for 'ocrd-keraslm-rate'.
make[1]: Nothing to be done for 'keraslm-rate'.
make[1]: Leaving directory '/build'
chmod +x /usr/local/bin/ocrd-keraslm-rate /usr/local/bin/keraslm-rate
make: Leaving directory '/build'
bertsky commented 7 months ago

What exactly does your docker-ocrd do?

jbarth-ubhd commented 7 months ago

perhaps not the right thing for persistency:

jb@pers16:~/workspace/ocrd-keras> cat /usr/local/bin/docker-ocrd 
#!/bin/bash

docker_ocrd () {
    models_in_container="/models"
    if echo "$@" | grep -q ocrd-tesser
    then
        models_in_container="/usr/local/share" # https://github.com/OCR-D/ocrd_all/issues/394#issue-1950168885
    fi
    # $time singularity exec --bind $TMPDIR:/tmp --bind .:/data --bind $HOME/ocrd_models:$models_in_container -e --env-file $HOME/ocrd.env $HOME/ocrd.sif "$@"
    docker run --rm -u 0 -v $PWD:/data -v /home/jb/ocrd-models:$models_in_container -w /data -- ocrd/all:maximum "$@"
}

docker_ocrd "$@"
jbarth-ubhd commented 7 months ago
jb@pers16:~> docker run -u 0 -it --name "kerasxx" -- ocrd/all:maximum bash
/data$ git -C /build/ocrd_keraslm checkout master
Previous HEAD position was 472197f update assets
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
/data$ git -C /build/ocrd_keraslm pull origin master
remote: Enumerating objects: 41, done.
remote: Counting objects: 100% (41/41), done.
remote: Compressing objects: 100% (15/15), done.
remote: Total 41 (delta 25), reused 41 (delta 25), pack-reused 0
Unpacking objects: 100% (41/41), 8.66 KiB | 554.00 KiB/s, done.
From https://github.com/OCR-D/ocrd_keraslm
 * branch            master     -> FETCH_HEAD
   b996c82..ea79b2a  master     -> origin/master
Updating b996c82..ea79b2a
Fast-forward
 .circleci/config.yml         |  10 +++---
 Makefile                     |  14 ++++++--
 README.md                    | 165 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------
 ocrd_keraslm/lib/rating.py   |  29 ++++++++++++++---
 ocrd_keraslm/scripts/run.py  |  23 ++++++++++---
 ocrd_keraslm/wrapper/rate.py |  89 ++++++++++++++++++++++++++++++++------------------
 test/test_wrapper.py         |   9 +++---
 7 files changed, 216 insertions(+), 123 deletions(-)
/data$ make -C /build -W ocrd_keraslm ocrd-keraslm-rate
make: Entering directory '/build'
make -o ocrd_keraslm ocrd-keraslm-rate keraslm-rate VIRTUAL_ENV=/usr/local/sub-venv/headless-tf1
make[1]: Entering directory '/build'
make[1]: Nothing to be done for 'ocrd-keraslm-rate'.
make[1]: Nothing to be done for 'keraslm-rate'.
make[1]: Leaving directory '/build'
chmod +x /usr/local/bin/ocrd-keraslm-rate /usr/local/bin/keraslm-rate
make: Leaving directory '/build'
/data$ 
jb@pers16:~> docker commit -p -a "Jochen" -m "keras.." 2d1f96764b44 ocrd_kerasxx
sha256:174de6aac01422a69e3cc74238fdc00bdea9d30d647914d5bd3b1001b0d11444
bertsky commented 7 months ago

Ah, ok, updating in sub-venvs has become more difficult now. Simplest way…

# in Docker
rm /usr/local/sub-venv/headless-tf1/bin/ocrd-keraslm-rate
# native venv
rm venv/sub-venv/headless-tf1/bin/ocrd-keraslm-rate

…then the above make call

jbarth-ubhd commented 7 months ago
/data$ rm /usr/local/sub-venv/headless-tf1/bin/ocrd-keraslm-rate
/data$ git -C /build/ocrd_keraslm checkout master
Already on 'master'
Your branch is up to date with 'origin/master'.
/data$ git -C /build/ocrd_keraslm pull origin master
From https://github.com/OCR-D/ocrd_keraslm
 * branch            master     -> FETCH_HEAD
Already up to date.
/data$ make -C /build -W ocrd_keraslm ocrd-keraslm-rate
make: Entering directory '/build'
make -o ocrd_keraslm ocrd-keraslm-rate keraslm-rate VIRTUAL_ENV=/usr/local/sub-
►venv/headless-tf1
make[1]: Entering directory '/build'
. /usr/local/sub-venv/headless-tf1/bin/activate && if test 3.8 = 3.8 && ! pip
► show -q tensorflow-gpu; then sem -q --will-cite --fg --id ocrd_all_pipheadless-
►tf1 pip install nvidia-pyindex && pushd $(mktemp -d) && sem -q --will-cite --fg
► --id ocrd_all_pipheadless-tf1 pip download --no-deps "nvidia-tensorflow==1.15.5
►+nv22.12" && for name in nvidia_tensorflow-*.whl; do name=${name%.whl}; done &&
► python3 -m wheel unpack $name.whl && for name in nvidia_tensorflow-*/; do name
►=${name%/}; done && newname=${name/nvidia_tensorflow/tensorflow_gpu} && sed -i s
►/nvidia_tensorflow/tensorflow_gpu/g $name/$name.dist-info/METADATA && sed -i s/
►nvidia_tensorflow/tensorflow_gpu/g $name/$name.dist-info/RECORD && sed -i s/
►nvidia_tensorflow/tensorflow_gpu/g $name/tensorflow_core/tools/pip_package/setup
►.py && pushd $name && for path in $name*; do mv $path ${path/$name/$newname};
► done && popd && python3 -m wheel pack $name && sem -q --will-cite --fg --id
► ocrd_all_pipheadless-tf1 pip install --no-cache-dir $newname*.whl && popd && rm
► -fr $OLDPWD; fi
# - preempt conflict over numpy between scikit-image and tensorflow
# - preempt conflict over numpy between tifffile and tensorflow (and allow py36)
. /usr/local/sub-venv/headless-tf1/bin/activate && sem -q --will-cite --fg --id
► ocrd_all_pipheadless-tf1 pip install imageio==2.14.1 "tifffile<2022"
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting imageio==2.14.1
  Downloading imageio-2.14.1-py3-none-any.whl.metadata (4.0 kB)
Collecting tifffile<2022
  Downloading tifffile-2021.11.2-py3-none-any.whl.metadata (29 kB)
Requirement already satisfied: numpy in /usr/local/sub-venv/headless-tf1/lib/
►python3.8/site-packages (from imageio==2.14.1) (1.23.5)
Requirement already satisfied: pillow>=8.3.2 in /usr/local/sub-venv/headless-tf
►1/lib/python3.8/site-packages (from imageio==2.14.1) (10.2.0)
Downloading imageio-2.14.1-py3-none-any.whl (3.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 53.1 MB/s eta 0:00:00
Downloading tifffile-2021.11.2-py3-none-any.whl (178 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 178.9/178.9 kB 147.8 MB/s eta 0:00:00
Installing collected packages: tifffile, imageio
  Attempting uninstall: tifffile
    Found existing installation: tifffile 2023.7.10
    Uninstalling tifffile-2023.7.10:
      Successfully uninstalled tifffile-2023.7.10
  Attempting uninstall: imageio
    Found existing installation: imageio 2.34.0
    Uninstalling imageio-2.34.0:
      Successfully uninstalled imageio-2.34.0
Successfully installed imageio-2.14.1 tifffile-2021.11.2
ERROR: pip's dependency resolver does not currently take into account all the
► packages that are installed. This behaviour is the source of the following
► dependency conflicts.
scikit-image 0.21.0 requires imageio>=2.27, but you have imageio 2.14.1 which is
► incompatible.
scikit-image 0.21.0 requires tifffile>=2022.8.12, but you have tifffile 2021.11.
►2 which is incompatible.
# - preempt conflict over numpy between h5py and tensorflow
. /usr/local/sub-venv/headless-tf1/bin/activate && sem -q --will-cite --fg --id
► ocrd_all_pipheadless-tf1 pip install "numpy<1.24"
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: numpy<1.24 in /usr/local/sub-venv/headless-tf1/
►lib/python3.8/site-packages (1.23.5)
. /usr/local/sub-venv/headless-tf1/bin/activate && cd ocrd_keraslm && sem -q --
►will-cite --fg --id ocrd_all_pipheadless-tf1 pip install --timeout=3000 -e . &&
► touch -c /usr/local/sub-venv/headless-tf1/bin/ocrd-keraslm-rate
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///build/ocrd_keraslm
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Requirement already satisfied: ocrd>=2.13.1 in /usr/local/sub-venv/headless-tf1/
►lib/python3.8/site-packages (from ocrd_keraslm==0.4.3) (2.63.3)
...
...
Installing collected packages: ocrd_keraslm
  Attempting uninstall: ocrd_keraslm
    Found existing installation: ocrd_keraslm 0.4.2
    Uninstalling ocrd_keraslm-0.4.2:
      Successfully uninstalled ocrd_keraslm-0.4.2
  Running setup.py develop for ocrd_keraslm
Successfully installed ocrd_keraslm-0.4.3
make[1]: Nothing to be done for 'keraslm-rate'.
make[1]: Leaving directory '/build'
chmod +x /usr/local/bin/ocrd-keraslm-rate /usr/local/bin/keraslm-rate
make: Leaving directory '/build'
/data$
bertsky commented 7 months ago

perhaps not the right thing for persistency:

that instantiates a new container with each invocation, so nothing will be shared/persisted, and none of the above recipes would work.

If you really want this behaviour, then use docker exec instead of docker run and try to just reuse the same container each time.

EDIT: also, there should be no need for the /models workaround anymore for tessdata.

bertsky commented 7 months ago

So, does it work now?

jbarth-ubhd commented 7 months ago

works:

jb@pers16:~/workspace/ocrd-keras> ./run.sh
+ set -e
+ docker-ocrd ocrd-keraslm-rate -I OCR-D-OCR -O OCR-D-KERAS -P model_file model_
►dta_full.h5 -P textequiv_level word -P alternative_decoding false
Using TensorFlow backend.
11:53:51.700 WARNING root - Limited tf.compat.v2.summary API due to missing
► TensorBoard installation.
11:53:51.852 INFO processor.KerasRate - using CPU LSTM implementation to compile
► stateful contiguous model of depth 2 width 128 length 256 size 1273
11:53:52.407 INFO processor.KerasRate - INPUT FILE 0 / p0002
11:53:52.440 INFO processor.KerasRate - Scoring text in page 'OCR-D-OCR_test-
►fouche10_5' at the word level
11:53:52.441 INFO ocrd.page_validator.validate - Validating input file 'OCR-D-
►OCR_test-fouche10_5'
11:53:52.677 INFO processor.KerasRate - Rating 1003 elements with a total of
► 3383 characters
 1/14 [=>............................] - ETA: 2s
 2/14 [===>..........................] - ETA: 1s
 3/14 [=====>........................] - ETA: 1s
 5/14 [=========>....................] - ETA: 0s
 6/14 [===========>..................] - ETA: 0s
 7/14 [==============>...............] - ETA: 0s
 8/14 [================>.............] - ETA: 0s
 9/14 [==================>...........] - ETA: 0s
10/14 [====================>.........] - ETA: 0s
11/14 [======================>.......] - ETA: 0s
12/14 [========================>.....] - ETA: 0s
13/14 [==========================>...] - ETA: 0s
14/14 [==============================] - 1s 70ms/step
11:53:53.703 INFO processor.KerasRate - avg: 0.334, char ppl: 7.185, word ppl:
► 773.807
11:53:53.719 INFO ocrd.process.profile - Executing processor 'ocrd-keraslm-rate
►' took 1.312577s (wall) 2.464611s (CPU)( [--input-file-grp='OCR-D-OCR' --output-
►file-grp='OCR-D-KERAS' --parameter='{"model_file": "model_dta_full.h5", "
►textequiv_level": "word", "alternative_decoding": false, "beam_width": 10, "lm_
►weight": 0.5}' --page-id='']
jbarth-ubhd commented 7 months ago

so conf=-Attributes are overwritten?

35a36,48
>         <pc:MetadataItem type="processingStep" name="recognition/text-recognition" value="ocrd-keraslm-rate">
>             <pc:Labels externalModel="ocrd-tool" externalId="parameters">
>                 <pc:Label value="model_dta_full.h5" type="model_file"/>
>                 <pc:Label value="word" type="textequiv_level"/>
>                 <pc:Label value="False" type="alternative_decoding"/>
>                 <pc:Label value="10" type="beam_width"/>
>                 <pc:Label value="0.5" type="lm_weight"/>
>             </pc:Labels>
>             <pc:Labels externalModel="ocrd-tool" externalId="version">
>                 <pc:Label value="0.4.3" type="ocrd-keraslm-rate"/>
>                 <pc:Label value="2.63.3" type="ocrd/core"/>
>             </pc:Labels>
>         </pc:MetadataItem>
53c66
<                     <pc:TextEquiv conf="0.966855773925781">
---
>                     <pc:TextEquiv conf="0.736842163484543">
59c72
<                     <pc:TextEquiv conf="0.962821807861328">
---
>                     <pc:TextEquiv conf="0.62280951615423">
65c78
<                     <pc:TextEquiv conf="0.966287384033203">
---
>                     <pc:TextEquiv conf="0.677824027637641">
bertsky commented 7 months ago

so conf=-Attributes are overwritten?

Yes, see docstring or --help or readme.