Closed Witiko closed 4 years ago
I did not pass --gpus all
when running my workflow. I am closing the issue.
I am reopening the issue. After running Docker, ocrd-calamari-recognize
dumps core with the following output (last three lines seem the most relevant):
$ docker run --gpus all --rm -u `id -u` -v /var/tmp/ocrd-workspace/825/:/data -w /data -v /var/tmp/tesseract/calamari_models:/models -- ocrd/all:maximum-cuda nice -n 10 ocrd process "calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P checkpoint /models/\*.ckpt.json"
2020-10-16 09:47:37,270.270 INFO ocrd.task_sequence.run_tasks - Start processing task 'calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -p '{"checkpoint": "/models/*.ckpt.json", "voter": "confidence_voter_default_ctc", "textequiv_level": "line", "glyph_conf_cutoff": 0.001}''
Traceback (most recent call last):
File "/usr/bin/ocrd", line 8, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/ocrd/cli/process.py", line 26, in process_cli
run_tasks(mets, log_level, page_id, tasks, overwrite)
File "/usr/lib/python3.6/site-packages/ocrd/task_sequence.py", line 149, in run_tasks
raise Exception("%s exited with non-zero return value %s. STDOUT:\n%s\nSTDERR:\n%s" % (task.executable, returncode, out, err))
Exception: ocrd-calamari-recognize exited with non-zero return value 134. STDOUT:
Checkpoint version 1 is up-to-date.
2020-10-16 09:47:46,804.804 WARNING tensorflow -
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
2020-10-16 09:47:46,807.807 WARNING tensorflow - From /usr/local/sub-venv/headless-tf1/lib/python3.6/site-packages/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_backend.py:15: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2020-10-16 09:47:46,807.807 WARNING tensorflow - From /usr/local/sub-venv/headless-tf1/lib/python3.6/site-packages/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_backend.py:16: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
STDERR:
2020-10-16 09:47:47.222778: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: no supported devices found for platform CUDA
/usr/bin/ocrd-calamari-recognize: line 2: 205 Aborted (core dumped) /usr/local/sub-venv/headless-tf1/bin/ocrd-calamari-recognize "$@"
I was launching several jobs on a single machine. After reducing the workload, I am getting successful executions, so either I will encounter the error with later jobs, or the issue was related to resource exhaustion.
Even then, the no supported devices found for platform CUDA
seems relevant: I am seeing no activity on the GPU except for python occupying 103MiB of VRAM, and the speedup compared to a non-GPU node is only 1.5×. It seems that the GPU is not used:
$ nvidia-smi
Fri Oct 16 13:52:43 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:3F:00.0 Off | 0 |
| N/A 51C P0 28W / 70W | 103MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:B2:00.0 Off | 0 |
| N/A 48C P0 27W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 11229 C .../headless-tf1/bin/python3 103MiB |
+-----------------------------------------------------------------------------+
ocrd_calamari is going to be based on Calamari 1.0 and TF2 with a different CUDA version, so I am not investing time to track this down.
GPU scheduling is a pain, I exclusively run inference on CPU.
@mikegerber
ocrd_calamari is going to be based on Calamari 1.0 and TF2 with a different CUDA version, so I am not investing time to track this down.
According to the tensorflow compatibility table, TF2 uses CUDA 10.0 or 10.1 according to the exact version of tensorflow.
GPU scheduling is a pain, I exclusively run inference on CPU.
So would I, but the calamari recognition on CPU takes a lot of time, almost half of the total workflow time, see below. I will appreciate suggestions where to begin if I would like to track this down myself.
ocrd_calamari is going to be based on Calamari 1.0 and TF2 with a different CUDA version, so I am not investing time to track this down. According to the tensorflow compatibility table, TF2 uses CUDA 10.0 or 10.1 according to the exact version of tensorflow.
Yes that table says: different CUDA version for TF2(.3), as I said.
Ah, here is the problem: The image uses CUDA Toolkit 11, which will not work with the pip-installed tensorflow-gpu 1.15 (see table) (nor will it work with the current pip-installed 2.3).
(The CUDA version nvidia-smi displays is the CUDA version the driver supports, not the CUDA Toolkit version.)
I am almost sure you never ran anything on the GPU with this setup.
👀 @kba @bertsky
I can't get any output running ocrd_calamari through ocrd process
but calling it directly:
% docker run --gpus all --rm -u `id -u` -v /tmp/actevedef_718448162.first-page+binarization+segmentation:/data -w /data -v /srv/data/qurator-data/calamari-models/GT4HistOCR/2019-07-22T15_49+0200/:/models -- ocrd/all:maximum-cuda ocrd-calamari-recognize -I OCR-D-SEG-LINE-SBB -O OCR-D-OCR -P checkpoint /models/\*.ckpt.json --overwrite
[...]
Using CUDNN compatible LSTM backend on CPU
Funny enough, the process shows up in nvidia-smi
anyway, with 100MB used.
I can't get any output running ocrd_calamari through
ocrd process
but calling it directly:
That is ocrd process
's fault, see https://github.com/OCR-D/core/issues/592, will be fixed.
Funny enough, the process shows up in
nvidia-smi
anyway, with 100MB used.
That's what I am seeing as well.
OT: How did you create this pie chart? Parsing the log output? Can you share the tooling for this, I'd be interested.
@kba I am running the individual steps in the workflow separately using GNU Parallel and saving the duration of the jobs using the --joblog
command. The joblog can then be easily parsed and a pie chart drawn. If you'd like, I can share the Python code for that.
If you'd like, I can share the Python code for that.
I do! Thanks!
@kba I created a Gist for you. I discuss usage in the comment below the code.
@kba Since calamari-recognize
takes up most of the time and tensorflow 1.15 requires CUDA Toolkit 10.0, would it make sense to base ocrd/all:maximum-cuda
on nvidia/cuda:10.0-*-ubuntu18.04
? Can you share, which specific nvidia/cuda
base image you are using for ocrd/all:maximum-cuda
at the moment? Thanks!
For the moment, I managed to get things running by building my own Docker image on top of nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
just for ocrd_calamari
as follows:
$ git clone https://github.com/OCR-D/ocrd_all.git
$ cd ocrd_all
$ git apply
diff --git a/Dockerfile b/Dockerfile
index 3f6e30e..d515056 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -36,6 +36,10 @@ ENV VIRTUAL_ENV $PREFIX
# make apt run non-interactive during build
ENV DEBIAN_FRONTEND noninteractive
+ENV LC_ALL=C.UTF-8
+ENV LANG=C.UTF-8
+ENV TF_FORCE_GPU_ALLOW_GROWTH=true
+
# make apt system functional
RUN apt-get -y update \
&& apt-get install -y apt-utils
$ git submodule update --init
$ docker build \
> --build-arg BASE_IMAGE=nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04 \
> --build-arg OCRD_MODULES="core ocrd_calamari workflow-configuration" \
> -t ocrd_calamari .
Unless the LC_ALL
and LANG
environmental variables are C.UTF-8
, running python fails with the following error:
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Consult https://click.palletsprojects.com/en/7.x/python3/ for mitigation steps.
This system supports the C.UTF-8 locale which is recommended.
You might be able to resolve your issue by exporting the
following environment variables:
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
Unless the TF_FORCE_GPU_ALLOW_GROWTH
environmental variable is true
, a single calamari-recognize
will consume all VRAM, although it only needs ca 4G.
Sat Oct 17 19:35:37 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:3F:00.0 Off | 0 |
| N/A 50C P0 28W / 70W | 14850MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:B2:00.0 Off | 0 |
| N/A 48C P0 27W / 70W | 106MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 4697 C .../headless-tf1/bin/python3 14847MiB |
| 1 N/A N/A 4697 C .../headless-tf1/bin/python3 103MiB |
+-----------------------------------------------------------------------------+
Should I open a pull request for these?
Can you share, which specific nvidia/cuda base image you are using for ocrd/all:maximum-cuda at the moment?
https://github.com/OCR-D/core/blob/56c4aa61cc89949326f5314bc0b4685c502c73fd/Makefile#L208
docker-cuda: DOCKER_BASE_IMAGE = nvidia/cuda:11.0-runtime-ubuntu18.04
Should I open a pull request for these?
Yes please
@kba I opened two pull requests: https://github.com/OCR-D/core/pull/629 and https://github.com/OCR-D/ocrd_all/pull/212.
I am using
ocrd-calamari-recognize
in my workflow using the ocrd/all:maximum-cuda Docker image. I have NVIDIA driver 455.23.05, CUDA 11.1, and two Tesla T4 GPUs. I can successfully run the following command:However, when I run my workflow, I can see in nvidia-smi that my GPUs are not used by
ocrd-calamari-recognize
. Do you have any idea why that could be?