Using the ocrd/all:maximum-cuda Docker image: No supported devices found for platform CUDA

Witiko commented 4 years ago

I am using ocrd-calamari-recognize in my workflow using the ocrd/all:maximum-cuda Docker image. I have NVIDIA driver 455.23.05, CUDA 11.1, and two Tesla T4 GPUs. I can successfully run the following command:

$ docker run --rm --gpus all ocrd/all:maximum-cuda nvidia-smi

However, when I run my workflow, I can see in nvidia-smi that my GPUs are not used by ocrd-calamari-recognize. Do you have any idea why that could be?

Witiko commented 4 years ago

I did not pass --gpus all when running my workflow. I am closing the issue.

Witiko commented 4 years ago

I am reopening the issue. After running Docker, ocrd-calamari-recognize dumps core with the following output (last three lines seem the most relevant):

$ docker run --gpus all --rm -u `id -u` -v /var/tmp/ocrd-workspace/825/:/data -w /data -v /var/tmp/tesseract/calamari_models:/models -- ocrd/all:maximum-cuda nice -n 10 ocrd process "calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P checkpoint /models/\*.ckpt.json"
2020-10-16 09:47:37,270.270 INFO ocrd.task_sequence.run_tasks - Start processing task 'calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -p '{"checkpoint": "/models/*.ckpt.json", "voter": "confidence_voter_default_ctc", "textequiv_level": "line", "glyph_conf_cutoff": 0.001}''
Traceback (most recent call last):
  File "/usr/bin/ocrd", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/ocrd/cli/process.py", line 26, in process_cli
    run_tasks(mets, log_level, page_id, tasks, overwrite)
  File "/usr/lib/python3.6/site-packages/ocrd/task_sequence.py", line 149, in run_tasks
    raise Exception("%s exited with non-zero return value %s. STDOUT:\n%s\nSTDERR:\n%s" % (task.executable, returncode, out, err))
Exception: ocrd-calamari-recognize exited with non-zero return value 134. STDOUT:
Checkpoint version 1 is up-to-date.
2020-10-16 09:47:46,804.804 WARNING tensorflow -
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

2020-10-16 09:47:46,807.807 WARNING tensorflow - From /usr/local/sub-venv/headless-tf1/lib/python3.6/site-packages/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_backend.py:15: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-10-16 09:47:46,807.807 WARNING tensorflow - From /usr/local/sub-venv/headless-tf1/lib/python3.6/site-packages/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_backend.py:16: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

STDERR:
2020-10-16 09:47:47.222778: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: no supported devices found for platform CUDA
/usr/bin/ocrd-calamari-recognize: line 2:   205 Aborted                 (core dumped) /usr/local/sub-venv/headless-tf1/bin/ocrd-calamari-recognize "$@"

Witiko commented 4 years ago

I was launching several jobs on a single machine. After reducing the workload, I am getting successful executions, so either I will encounter the error with later jobs, or the issue was related to resource exhaustion.

Even then, the no supported devices found for platform CUDA seems relevant: I am seeing no activity on the GPU except for python occupying 103MiB of VRAM, and the speedup compared to a non-GPU node is only 1.5×. It seems that the GPU is not used:

$ nvidia-smi
Fri Oct 16 13:52:43 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:3F:00.0 Off |                    0 |
| N/A   51C    P0    28W /  70W |    103MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:B2:00.0 Off |                    0 |
| N/A   48C    P0    27W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     11229      C   .../headless-tf1/bin/python3      103MiB |
+-----------------------------------------------------------------------------+

mikegerber commented 4 years ago

ocrd_calamari is going to be based on Calamari 1.0 and TF2 with a different CUDA version, so I am not investing time to track this down.

GPU scheduling is a pain, I exclusively run inference on CPU.

Witiko commented 4 years ago

@mikegerber

ocrd_calamari is going to be based on Calamari 1.0 and TF2 with a different CUDA version, so I am not investing time to track this down.

According to the tensorflow compatibility table, TF2 uses CUDA 10.0 or 10.1 according to the exact version of tensorflow.

GPU scheduling is a pain, I exclusively run inference on CPU.

So would I, but the calamari recognition on CPU takes a lot of time, almost half of the total workflow time, see below. I will appreciate suggestions where to begin if I would like to track this down myself.

output-ocr-ocrd pie-chart

mikegerber commented 4 years ago

ocrd_calamari is going to be based on Calamari 1.0 and TF2 with a different CUDA version, so I am not investing time to track this down. According to the tensorflow compatibility table, TF2 uses CUDA 10.0 or 10.1 according to the exact version of tensorflow.

Yes that table says: different CUDA version for TF2(.3), as I said.

mikegerber commented 4 years ago

Ah, here is the problem: The image uses CUDA Toolkit 11, which will not work with the pip-installed tensorflow-gpu 1.15 (see table) (nor will it work with the current pip-installed 2.3).

(The CUDA version nvidia-smi displays is the CUDA version the driver supports, not the CUDA Toolkit version.)

I am almost sure you never ran anything on the GPU with this setup.

👀 @kba @bertsky

mikegerber commented 4 years ago

I can't get any output running ocrd_calamari through ocrd process but calling it directly:

% docker run --gpus all --rm -u `id -u` -v /tmp/actevedef_718448162.first-page+binarization+segmentation:/data -w /data -v /srv/data/qurator-data/calamari-models/GT4HistOCR/2019-07-22T15_49+0200/:/models -- ocrd/all:maximum-cuda ocrd-calamari-recognize -I OCR-D-SEG-LINE-SBB -O OCR-D-OCR -P checkpoint /models/\*.ckpt.json --overwrite
[...]
Using CUDNN compatible LSTM backend on CPU

Funny enough, the process shows up in nvidia-smi anyway, with 100MB used.

kba commented 4 years ago

I can't get any output running ocrd_calamari through ocrd process but calling it directly:

That is ocrd process's fault, see https://github.com/OCR-D/core/issues/592, will be fixed.

Witiko commented 4 years ago

Funny enough, the process shows up in nvidia-smi anyway, with 100MB used.

That's what I am seeing as well.

kba commented 4 years ago

OT: How did you create this pie chart? Parsing the log output? Can you share the tooling for this, I'd be interested.

Witiko commented 4 years ago

@kba I am running the individual steps in the workflow separately using GNU Parallel and saving the duration of the jobs using the --joblog command. The joblog can then be easily parsed and a pie chart drawn. If you'd like, I can share the Python code for that.

kba commented 4 years ago

If you'd like, I can share the Python code for that.

I do! Thanks!

Witiko commented 4 years ago

@kba I created a Gist for you. I discuss usage in the comment below the code.

Witiko commented 4 years ago

@kba Since calamari-recognize takes up most of the time and tensorflow 1.15 requires CUDA Toolkit 10.0, would it make sense to base ocrd/all:maximum-cuda on nvidia/cuda:10.0-*-ubuntu18.04? Can you share, which specific nvidia/cuda base image you are using for ocrd/all:maximum-cuda at the moment? Thanks!

Witiko commented 4 years ago

For the moment, I managed to get things running by building my own Docker image on top of nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04 just for ocrd_calamari as follows:

$ git clone https://github.com/OCR-D/ocrd_all.git
$ cd ocrd_all
$ git apply

diff --git a/Dockerfile b/Dockerfile
index 3f6e30e..d515056 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -36,6 +36,10 @@ ENV VIRTUAL_ENV $PREFIX
 # make apt run non-interactive during build
 ENV DEBIAN_FRONTEND noninteractive

+ENV LC_ALL=C.UTF-8
+ENV LANG=C.UTF-8
+ENV TF_FORCE_GPU_ALLOW_GROWTH=true
+
 # make apt system functional
 RUN apt-get -y update \
  && apt-get install -y apt-utils

$ git submodule update --init
$ docker build \
> --build-arg BASE_IMAGE=nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04 \
> --build-arg OCRD_MODULES="core ocrd_calamari workflow-configuration" \
> -t ocrd_calamari .

Unless the LC_ALL and LANG environmental variables are C.UTF-8, running python fails with the following error:

RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Consult https://click.palletsprojects.com/en/7.x/python3/ for mitigation steps.

This system supports the C.UTF-8 locale which is recommended.
You might be able to resolve your issue by exporting the
following environment variables:

    export LC_ALL=C.UTF-8
    export LANG=C.UTF-8

Unless the TF_FORCE_GPU_ALLOW_GROWTH environmental variable is true, a single calamari-recognize will consume all VRAM, although it only needs ca 4G.

Sat Oct 17 19:35:37 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:3F:00.0 Off |                    0 |
| N/A   50C    P0    28W /  70W |  14850MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:B2:00.0 Off |                    0 |
| N/A   48C    P0    27W /  70W |    106MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4697      C   .../headless-tf1/bin/python3    14847MiB |
|    1   N/A  N/A      4697      C   .../headless-tf1/bin/python3      103MiB |
+-----------------------------------------------------------------------------+

Should I open a pull request for these?

kba commented 4 years ago

Can you share, which specific nvidia/cuda base image you are using for ocrd/all:maximum-cuda at the moment?

https://github.com/OCR-D/core/blob/56c4aa61cc89949326f5314bc0b4685c502c73fd/Makefile#L208

docker-cuda: DOCKER_BASE_IMAGE = nvidia/cuda:11.0-runtime-ubuntu18.04

Should I open a pull request for these?

Yes please

Witiko commented 4 years ago

@kba I opened two pull requests: https://github.com/OCR-D/core/pull/629 and https://github.com/OCR-D/ocrd_all/pull/212.

OCR-D / ocrd_calamari

Using the ocrd/all:maximum-cuda Docker image: No supported devices found for platform CUDA #46