Closed stefanCCS closed 1 year ago
Release v2023-03-26
that's before the recent fixes in https://github.com/OCR-D/ocrd_all/pull/362 and https://github.com/OCR-D/core/pull/1041
Please use the most recent version.
Also, IMO the report should go to ocrd_all repo, not here. (There's a test-cuda target there, too.)
Sorry, for using wrong repo ... Concerning using of "most recent version". I have made "git pull" today. So, maybe the recent version is not yet published ?
ok, I guess you did use the current version after all. From the title of the issue it sounded like an older checkout. (Current tip is d8cdeec43c5e5315f76ee1b8d196bb343c6aeaad. The most recent tag is v2023-06-14
. I have no idea where your v2023-03-26
comes from...)
In detail: I have downloaded latest version with:
cd ~/ocrd_all git pull
Then I have made:
sudo make deps-cuda
That's not the correct procedure after an update, though. You first have to make sure your submodules are up to date. Doing make all
would have implied that, but deps-cuda
only implies an update of core
– but under the sudo
privileges, this will likely chown parts of .git
. Therefore the README recommends doing make ocrd
(without sudo) before make deps-cuda
.
Can you check with git submodule status
and find .git -user 0
?
I have used this "CUDA_VERSION" parameter, as on my system somehow CUDA-Version 11.6 is the default, and I have seen, that CUDA-Version 11.8 was installed with
sudo make deps-cuda
.
Yes, that's the best way to do it. The version identifier recipe in ocrd_detectron2 picks whatever matches first, unless using this override. Alas, at the moment we cannot guarantee that ocrd_kraken and ocrd_typegroups_classifier (which also depend on Pytorch) do not overwrite with their version. Best way to check is make test-cuda
.
This has run successfully with one remark: Somewhere in between I have seen this error message:
... Synchronizing submodule url for 'ocrd_fileformat/repo/ocr-fileformat/vendor/xsd-validator' if git submodule status --recursive ocrd_fileformat | grep -qv '^ '; then \ sem -q --will-cite --fg --id ocrd_all_git git submodule update --init --recursive ocrd_fileformat && \ touch ocrd_fileformat; fi fatal: failed to recurse into submodule 'ocrd_fileformat'
Sounds like an issue with your checkout. It could be the ownership problem alluded to above, or some previous failure. Strange though that despite git's complaint, it does not exit with error and does in fact recurse...
make test-cuda CUDA_VERSION=11.8
That variable is only relevant at install time – it has no effect on the test itself.
2023-06-20 12:00:03.521351: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: /home/gputest/ocrd-3.8/sub-venv/headless-tf1/lib/python3.8/site-packages/tensorflow_core/python/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLt_for_cublas_HSS, version libcublasLt.so.11; LD_LIBRARY_PATH: /usr/local/cuda-11.6/lib64:
ok, so apparently somewhere in your environment you have an dynamic linker override variable LD_LIBRARY_PATH
, which points to CUDA 11.6, so our 11.8 (installed by deps-cuda
via ld.so.conf rules) has no chance. This was a deliberate choice BTW: we don't want to be intrusive (and setting LD_LIBRARY_PATH is as intrusive as it gets).
Could you please check where that variable is coming from? (I.e. bashrc or profile or venv..., and who installed that, e.g. CUDA installer script, or manual)
Many thanks @bertsky for your detailed comments. And, I am very sorry for any confusion I might have created with putting the wrong Release Name. This was just a stupid copy&paste error. Of course, I have used v2023-06-14. I have updated issue title and my first comment accordingly.
I will follow your advice and come back with another feedback ...
git submodule status
creates this list:
2c4b1ffc123e867cc5e5203970996bfb05075397 cor-asv-ann (v0.1.2-99-g2c4b1ff)
-076e04ef882bbed0b5b70e6a6a461940b82bb404 cor-asv-fst
670862493408008441963a739ef650c6d3fa122d core (v2.33.0-790-g670862493)
35be58cb9456b0893bc46640b234912148621fb6 dinglehopper (remotes/origin/HEAD)
a7ffdda68a4c9c4e0b0494e7b0f865d92297ac30 docstruct (heads/master)
706433c5049c63c6e16fee5f71d81a7e507abe06 eynollah (v0.2.0-7-g706433c)
9615db1920cb8e15a38427333b41cdbee8baf4b6 format-converters (heads/master)
cf7c60f898039d765984a7eb8704e7e0fbe6c88d nmalign (v0.0.3-7-gcf7c60f)
5978a1fef1b5b863f71e0a9abd1ff8668876c661 ocrd_anybaseocr (v1.9.0-2-g5978a1f)
3a029ca512cec911aa32f7156c831c0cca75543f ocrd_calamari (v1.0.5-11-g3a029ca)
a0ea0a2a4aeea99414c08ae543585b994f9ab0d5 ocrd_cis (v0.0.10-149-ga0ea0a2)
04bf4c6d325ca383671e463543ffe132f3b70f19 ocrd_detectron2 (v0.1.7-17-g04bf4c6)
a95f8e77886c9860101392d088742ca0af277945 ocrd_doxa (v0.0.2)
4e7e0de68e2a0dcd9b238f64d1657beda0d74da7 ocrd_fileformat (v0.5.0-15-g4e7e0de)
105697f589839cc14d8a1e3be939598e2be1b06f ocrd_im6convert (v0.0.5)
9e3f5a06b8efb706f8f1ac1c172fa5809ad6bab9 ocrd_keraslm (0.3.1-33-g9e3f5a0)
b13dd8a932b7dfbfe5019698e87542f5f767e2bd ocrd_kraken (v0.3.0-21-gb13dd8a)
0f64f07635875bc75a53365e425870858b0d388a ocrd_neat (v0.0.1)
-a6e556ec182bb18b755bfd818e7f72326b5819fa ocrd_ocropy
6bcbb4bbb6847e581bdb84aa1c2c32b632d083f2 ocrd_olahd_client (v0.0.2)
dbef5340432a0a138f6cd07e3e321a2fa5e658e2 ocrd_olena (v1.3.0)
4f4a330c97208635e7b304cfce4db9e937fefd2b ocrd_pagetopdf (v1.0.0-12-g4f4a330)
-ead3fdd19c9dceb69499d8e2267e71b9cd3bcd2c ocrd_pc_segmentation
c898d6ce2de46abc06d1f88b4b919b768d073c41 ocrd_repair_inconsistencies (heads/master)
3c63e21b168b83bbb02caf4ce212db94447a5f4b ocrd_segment (v0.1.21-9-g3c63e21)
09d1e13cdaf056c8542a7adbbc9b9927e2a54d2b ocrd_tesserocr (v0.2.2-454-g09d1e13)
a78a85f57f27a28f01dd125e67d0e7676a1c7566 ocrd_typegroups_classifier (v0.5.0)
2cd800d9eccbc084751558a87972ac22ee60e87a ocrd_wrap (v0.1.8)
-474a1cc0ebf2086c596b60c050a9e1af658ff380 opencv-python
010ec99d2a666c363efb7e50c1eb2423857ff092 sbb_binarization (v0.1.0)
1569e5080810f4652b720bcd344026a9b236ec50 tesseract (5.3.0-46-g1569e508)
e184c62becd1c3c87c0546c9df506d639de8478d tesserocr (v2.1.2-127-ge184c62)
5aff777c761cae1b6f9d954fb80f9b212e8fab92 workflow-configuration (remotes/origin/HEAD)
and find .git -user 0
is empty
gputest@linuxgputest2:~/ocrd_all$ find .git -user 0
gputest@linuxgputest2:~/ocrd_all$
Concerning LD_LIBRARY_PATH
you are right - is points to Version 11.6:
gputest@linuxgputest2:~/ocrd_all$ echo $LD_LIBRARY_PATH
/usr/local/cuda-11.6/lib64:
gputest@linuxgputest2:~/ocrd_all$
Yes, it is set in .bashrc
:
gputest@linuxgputest2:~$ grep LD_LIBRARY_PATH .bashrc
export LD_LIBRARY_PATH=/usr/local/cuda-11.6/lib64:$LD_LIBRARY_PATH
gputest@linuxgputest2:~$
Well, of course I could change this, but I have no idea, where the 11.8 version was installed to - see:
gputest@linuxgputest2:/usr/local$ ls -ld cuda*
lrwxrwxrwx 1 root root 22 Mar 29 2022 cuda -> /etc/alternatives/cuda
lrwxrwxrwx 1 root root 25 Mar 29 2022 cuda-11 -> /etc/alternatives/cuda-11
drwxr-xr-x 16 root root 4096 Mar 29 2022 cuda-11.6
For both folders cuda
and cuda-11
I only can find version 11.6, e.g.:
gputest@linuxgputest2:/etc/alternatives/cuda-11/lib64$ ll libnppidei*
lrwxrwxrwx 1 root root 16 Mar 9 2022 libnppidei.so -> libnppidei.so.11
lrwxrwxrwx 1 root root 22 Mar 9 2022 libnppidei.so.11 -> libnppidei.so.11.6.3.9
-rw-r--r-- 1 root root 9659544 Mar 9 2022 libnppidei.so.11.6.3.9
-rw-r--r-- 1 root root 10209110 Mar 9 2022 libnppidei_static.a
A find
also does not provide a hint, where I can find version 11.8:
gputest@linuxgputest2:/$ find . -name "libnvjpeg.so.11.8*" 2>&1 | grep -v "Permission denied"
gputest@linuxgputest2:/$
--> So, please tell me, where I can find the version 11.8
Anyway, I will do re-install now, following your @bertsky advises from above.
Hmm, not good ...
I have simply made:
Another git pull
in directory ~/ocrd_all
, which has a bit surprisingly given a few new files:
("surprisingly", because I have assumed, that I only get the data of the newest official release (here "v2023-06-14") and not any "random" new data - looks like my assumption is wrong?!)
remote: Enumerating objects: 33, done.
remote: Counting objects: 100% (33/33), done.
remote: Compressing objects: 100% (20/20), done.
remote: Total 33 (delta 17), reused 26 (delta 13), pack-reused 0
Unpacking objects: 100% (33/33), 21.29 KiB | 1.42 MiB/s, done.
From https://github.com/OCR-D/ocrd_all
d8cdeec..8a68597 master -> origin/master
Updating d8cdeec..8a68597
Fast-forward
.github/workflows/makeall.yml | 5 ++---
CHANGELOG.md | 9 +++++++++
Dockerfile | 13 +++++++++++++
Makefile | 11 ++++++++---
4 files changed, 32 insertions(+), 6 deletions(-)
Then, I just have called make ocrd
- and this has created this error
(mainly I see at the beginning:
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/home/gputest/ocrd_all/venv/lib/python3.8/site-packages/PIL'
- but please check also the rest of the message):
...
Successfully built ocrd-utils atomicwrites
Installing collected packages: Pillow, numpy, frozendict, atomicwrites, ocrd-utils
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/home/gputest/ocrd_all/venv/lib/python3.8/site-packages/PIL'
Check the permissions.
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing /home/gputest/ocrd_all/core/ocrd_models
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/home/gputest/ocrd_all/core/ocrd_models/setup.py", line 4, in <module>
from ocrd_utils import VERSION
ModuleNotFoundError: No module named 'ocrd_utils'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing /home/gputest/ocrd_all/core/ocrd_modelfactory
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/home/gputest/ocrd_all/core/ocrd_modelfactory/setup.py", line 4, in <module>
from ocrd_utils import VERSION
ModuleNotFoundError: No module named 'ocrd_utils'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing /home/gputest/ocrd_all/core/ocrd_validators
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/home/gputest/ocrd_all/core/ocrd_validators/setup.py", line 4, in <module>
from ocrd_utils import VERSION
ModuleNotFoundError: No module named 'ocrd_utils'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing /home/gputest/ocrd_all/core/ocrd_network
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/home/gputest/ocrd_all/core/ocrd_network/setup.py", line 3, in <module>
from ocrd_utils import VERSION
ModuleNotFoundError: No module named 'ocrd_utils'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing /home/gputest/ocrd_all/core/ocrd
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/home/gputest/ocrd_all/core/ocrd/setup.py", line 3, in <module>
from ocrd_utils import VERSION
ModuleNotFoundError: No module named 'ocrd_utils'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
make[1]: *** [Makefile:120: install] Error 1
make[1]: Leaving directory '/home/gputest/ocrd_all/core'
make: *** [Makefile:230: /home/gputest/ocrd_all/venv/bin/ocrd] Error 2
--> it looks like there simply no exists the PIL
folder (or file?) here:
gputest@linuxgputest2:~/ocrd_all$ ls /home/gputest/ocrd_all/venv/lib/python3.8/site-packages/
_distutils_hack nvidia_cuda_nvrtc_cu117-11.7.50.dist-info nvidia_curand_cu11-2022.4.8.dist-info nvidia_pyindex setuptools
distutils-precedence.pth nvidia_cuda_runtime_cu11-2022.4.25.dist-info nvidia_curand_cu117-10.2.10.50.dist-info nvidia_pyindex-1.0.9.dist-info setuptools-68.0.0.dist-info
nvidia nvidia_cuda_runtime_cu117-11.7.60.dist-info nvidia_cusolver_cu11-2022.4.8.dist-info pip wheel
nvidia_cublas_cu11-2022.4.8.dist-info nvidia_cudnn_cu11-8.6.0.163.dist-info nvidia_cusolver_cu117-11.3.5.50.dist-info pip-23.1.2.dist-info wheel-0.40.0.dist-info
nvidia_cublas_cu117-11.10.1.25.dist-info nvidia_cufft_cu11-2022.4.8.dist-info nvidia_cusparse_cu11-2022.4.8.dist-info pkg_resources
nvidia_cuda_nvrtc_cu11-2022.4.8.dist-info nvidia_cufft_cu117-10.7.2.50.dist-info nvidia_cusparse_cu117-11.7.3.50.dist-info pkg_resources-0.0.0.dist-info
-> any recommendation?
Concerning
LD_LIBRARY_PATH
you are right - is points to Version 11.6: Well, of course I could change this, but I have no idea, where the 11.8 version was installed to - see:
Have a look at the deps-cuda
target in core/Makefile: it will
So all you need to do AFAICS (while using ocrd_all) is to suppress your LD_LIBRARY_PATH
envvar (either by setting it to empty in your current shell or commenting the setting in .bashrc).
Another
git pull
in directory~/ocrd_all
, which has a bit surprisingly given a few new files: ("surprisingly", because I have assumed, that I only get the data of the newest official release (here "v2023-06-14") and not any "random" new data - looks like my assumption is wrong?!)
There have been merges with additional improvements, but no new release yet (which I guess is normal dev cycle, so I'm surprised you're surprised...).
Permission denied: '/home/gputest/ocrd_all/venv/lib/python3.8/site-packages/PIL'
That was I had suspected. Only strange that the find -user 0
did not catch it.
So please sudo chown -r uid:gid ~/ocrd_all
to fix what went wrong last time.
(There's no need to re-do sudo make deps-cuda
BTW.)
but please check also the rest of the message any recommendation?
Looks like follow-up errors. To be on the safe side, make clean
before the next make all
.
Looks like I made it :-)
make test-cuda
results in everything seems to be fine
.
So, many thanks @bertsky for your support. I will close this issue here now.
Splendid. So to recap:
make modules
before any sudo
actionLD_LIBRARY_PATH
then that needs to be suppressedBTW, to be on the safe side, consider running make test-workflow
(i.e. coverage test) afterwards.
Concerning "recap":
make ocrd
before `sudo make deps-cuda'LD_LIBRARY_PATH
- correctConcerning make test-workflow
: I have my own test workflow, which runs fine (of course this tests only the basic modules I use).
Now I have called make test-workflow
and get this error:
2023-06-23 11:27:29.540 INFO ocrd.cli.resmgr - Use in parameters as 'default-2021-03-09'
+ ocrd-sbb-binarize -I OCR-D-IMG -O OCR-D-BIN -P model default-2021-03-09
2023-06-23 11:27:38.592 INFO processor.SbbBinarize - INPUT FILE 0 / PHYS_0001
2023-06-23 11:27:39.011 INFO processor.SbbBinarize - Binarizing on 'page' level in page 'PHYS_0001'
2023-06-23 11:27:39.052 INFO processor.SbbBinarize.__init__ - Predicting with model /home/gputest/.local/share/ocrd-resources/ocrd-sbb-binarize/default-2021-03-09/saved_model_2021_03_09/ [1/1]
2023-06-23 11:27:40.975 ERROR ocrd.processor.helpers.run_processor - Failure in processor 'ocrd-sbb-binarize'
Traceback (most recent call last):
File "/home/gputest/ocrd-3.8/lib/python3.8/site-packages/ocrd/processor/helpers.py", line 128, in run_processor
processor.process()
File "/home/gputest/ocrd-3.8/lib/python3.8/site-packages/sbb_binarize/ocrd_cli.py", line 113, in process
bin_image = cv2pil(self.binarizer.run(image=pil2cv(page_image)))
File "/home/gputest/ocrd-3.8/lib/python3.8/site-packages/sbb_binarize/sbb_binarize.py", line 244, in run
res = self.predict(model, image)
File "/home/gputest/ocrd-3.8/lib/python3.8/site-packages/sbb_binarize/sbb_binarize.py", line 157, in predict
label_p_pred = model.predict(img_patch.reshape(1, img_patch.shape[0], img_patch.shape[1], img_patch.shape[2]),
File "/home/gputest/ocrd-3.8/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/gputest/ocrd-3.8/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:
Detected at node 'model_2/conv1/Conv2D' defined at (most recent call last):
File "/home/gputest/ocrd-3.8/bin/ocrd-sbb-binarize", line 8, in <module>
sys.exit(cli())
Should I create another issue for this? (or re-open this one?)
Concerning "recap":
* I have made a `make ocrd` before `sudo make deps-cuda'
yes, that's sufficient (for that task). In general, modules
will ensure all updates are done. (And this is needed for sudo make deps-ubuntu
, since that will also depend on all modules.)
Should I create another issue for this? (or re-open this one?)
Yes, please do. (This is new.)
https://github.com/qurator-spk/sbb_binarization/ would be best fit IMO.
Please also explain what version of the model you have installed (e.g. find /home/gputest/.local/share/ocrd-resources/ocrd-sbb-binarize/default-2021-03-09/saved_model_2021_03_09/ -exec md5sum {} \;
)
@bertsky :
On another machine I have tried to do the same new installation of ocrd_all
.
This time I have called make modules
(instead of make ocrd
) before sudo ...
.
This make modules
creates this following error:
Submodule path 'ocrd_cis': checked out 'a0ea0a2a4aeea99414c08ae543585b994f9ab0d5'
From https://github.com/cisocrgroup/ocrd_cis
* branch a0ea0a2a4aeea99414c08ae543585b994f9ab0d5 -> FETCH_HEAD
sem -q --will-cite --fg --id ocrd_all_git git submodule sync ocrd_detectron2
Synchronizing submodule url for 'ocrd_detectron2'
if git submodule status ocrd_detectron2 | grep -qv '^ '; then \
sem -q --will-cite --fg --id ocrd_all_git git submodule update --init ocrd_detectron2 && \
touch ocrd_detectron2; fi
error: Your local changes to the following files would be overwritten by checkout:
ocrd_detectron2/segment.py
Please commit your changes or stash them before you switch branches.
Aborting
fatal: Unable to checkout '04bf4c6d325ca383671e463543ffe132f3b70f19' in submodule path 'ocrd_detectron2'
make: *** [Makefile:189: ocrd_detectron2] Error 1
--> should I try with make ocrd
? (or maybe you want to investigate this?)
error: Your local changes to the following files would be overwritten by checkout: ocrd_detectron2/segment.py Please commit your changes or stash them before you switch branches. Aborting
Looks like you instrumented the code...
yes, you are right.
git -C ocrd_detectron2 reset --hard
has helped.
(I only get this error again: https://github.com/OCR-D/ocrd_all/issues/381) -> which I have ignored.
Hi,
I just have installed the
ocrd_all
Release v2023-06-14, and it looks like I have an issue with GPU/CUDA.Hint: I use
Ubuntu 22.04.1
In detail: I have downloaded latest version with:
Then I have made:
I have created a new VENV like this:
(Remark: Python 3.8 was already available using "deadsnakes" repo)
Next I have done the "main make" like this:
--> I have used this "CUDA_VERSION" parameter, as on my system somehow CUDA-Version 11.6 is the default, and I have seen, that CUDA-Version 11.8 was installed with
sudo make deps-cuda
.This has run successfully with one remark: Somewhere in between I have seen this error message:
--> Nevertheless this make has run through successfully. (but maybe I have overlooked more "hidden" errors like this).
If I now do
make test-cuda
(in this VENV) I get this error: (I get the same error without using parameter "CUDA_VERSION")At least: My standard test workflow works find (but does not use the GPU).