Closed bertsky closed 2 years ago
Thanks for bringing this up. So the fix you're proposing is to add a CUDA configuration to /etc/ld.so.conf.d
with all the versions of the DLL? That is refreshingly simple.
So, yes, we should document this in the README (and possibly provide a sudo make link-cuda
command).
And to solve this for the Docker images, we'd base the ocrd/core-cuda
image on the latest release and install the older CUDA versions into it?
If I got this right, then let's implement this.
Thanks for bringing this up. So the fix you're proposing is to add a CUDA configuration to
/etc/ld.so.conf.d
with all the versions of the DLL? That is refreshingly simple.
Yes, it would seem so.
So, yes, we should document this in the README (and possibly provide a
sudo make link-cuda
command).
Embarrassingly, (to my knowledge) we do not provide any documentation on CUDA/GPU setup yet. Not here and not in the setup guide. Besides the information above we would first need to explain which CUDA version is needed for which processors. According to the official compatibility matrix and the experiments conducted by @mikegerber, we'll need up to 11.2 for newest TF 2.5 (which gets dragged in by ocrd_pc_segmentation, and currently also ocrd_calamari) and down to 10.0 for TF 1.15 (still needed by most TF processors).
Not all native installations will already have the required driver required for the newest CUDA toolkit. The official requirement for 11.3.1 seems to be nvidia-driver-450
(i.e. 450.80.02) on Linux, and version 456.38 on Windows. So users might choose not to maximally upgrade their system if the processors they need are already supported by a lower CUDA version and driver. (We should document the above table alongside a list of our current processor requirements.)
But in my view, since we are targetting Ubuntu, I think we should go further than just providing some sudo make cuda-ldconfig
knob: we could provide some top-level sudo make cuda-ubuntu
which
apt-get install cuda-runtime-10-0 cuda-runtime-10-1 cuda-runtime-10-2 cuda-runtime-11-0 cuda-runtime-11-1 cuda-runtime-11-2 cuda-11-3
cuda-ldconfig
knob(Also, I am not sure about the paths; CUDA toolkit packages for other systems might use different paths for the libraries. So it's probably all Ubuntu specific anyway.)
And to solve this for the Docker images, we'd base the
ocrd/core-cuda
image on the latest release and install the older CUDA versions into it?
Yes – also with the above knob. (So for the ocrd_all native we need make cuda-ldconfig
here, but for ocrd/all and ocrd/core-cuda we need it in core's makefile / Dockerfile / CircleCI cfg.
It turns out that there's still a problem with this: images like nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu18.04
contain a package cuda-toolkit-config-common
, which has a file /etc/ld.so.conf.d/000_cuda.conf
that conflicts with other cudart packages (although no package conflict is registered in the metadata, and we don't need this config file anyway with our cuda-ldconfig
rule). This is the error I am seeing:
dpkg: error processing archive /var/cache/apt/archives/cuda-cudart-11-2_11.2.152-1_amd64.deb (--unpack):
trying to overwrite '/etc/ld.so.conf.d/000_cuda.conf', which is also in package cuda-toolkit-config-common 11.4.43-1
If I force the installation of the deb with dpkg -i --force-all cuda-cudart-11-2_11.2.152-1_amd64.deb
then all runs smoothly. But I don't quite know how to script this yet.
It turns out that there's still a problem with this: images like
nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu18.04
contain a packagecuda-toolkit-config-common
, which has a file/etc/ld.so.conf.d/000_cuda.conf
that conflicts with other cudart packages (although no package conflict is registered in the metadata, and we don't need this config file anyway with ourcuda-ldconfig
rule). This is the error I am seeing:dpkg: error processing archive /var/cache/apt/archives/cuda-cudart-11-2_11.2.152-1_amd64.deb (--unpack): trying to overwrite '/etc/ld.so.conf.d/000_cuda.conf', which is also in package cuda-toolkit-config-common 11.4.43-1
If I force the installation of the deb with
dpkg -i --force-all cuda-cudart-11-2_11.2.152-1_amd64.deb
then all runs smoothly. But I don't quite know how to script this yet.
Update: Further investigation reveals that this is a specific problem of cuda-cudart-11-2
(or in turn cuda-runtime-11-2
), which AFAICT we don't strictly need, as long as we have the newer and older versions.
But the new core-cuda image is huge: 12 GB instead of 1 GB.
Here's a list of compatible PyPI-TensorFlow versions and CUDA Toolkit versions: https://github.com/mikegerber/test-nvidia#results
I'm not testing TF2.1 there, is this version needed for ocropy and anybaseocr?
I'm not testing TF2.1 there, is this version needed for ocropy and anybaseocr?
As #289 shows, we can make ocrd_anybaseocr work with newer TF, so this problem will soon be gone.
The above ideas have been merged in #270 already, the only problem currently is that the most recent Docker prebuild of the maximum-cuda variant did not complete – which will be solved by #287.
So when everything falls in place, I think we can close #279. But this one we can already close.
To my knowledge, despite our efforts to work around the Tensorflow dependency hell (each TF version being tied closely to a narrow range of CUDA / Python / Numpy versions, and in turn CUDA being dependent on certain libcudnn / nvidia-driver), we have not yet tackled the problem of providing GPU access to multiple OCR-D processors relying on different TF versions at the same time yet.
However, for native installations, the solution is not far away: Since Nvidia put the version numbers into all the package names, it is in principle possible to install multiple versions of CUDA runtime and cuDNN at the same time – as long as they all can agree on a suitable
nvidia-driver
(which is usually the newest; luckily, this one appears to be largely backwards compatible). The problem is that TF loads the libcudart dynamically and to that end, needs the right version in the dynamic linker/loader's search path. But the CUDA packages seem to only activate the last installed CUDA toolkit in ld.so.conf. This is easily fixed, however:This _does_ work: for venv in venv/local/sub-venv/headless-tf*; do . $venv/bin/activate && python -c "import tensorflow as tf; print(tf.test.is_gpu_available())"; done
``` 2021-06-16 22:07:03.732233: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2021-06-16 22:07:03.755631: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3399905000 Hz 2021-06-16 22:07:03.756510: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4769990 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2021-06-16 22:07:03.756579: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2021-06-16 22:07:03.766686: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2021-06-16 22:07:03.864881: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:03.867961: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x47f8450 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2021-06-16 22:07:03.867978: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce GTX 1080, Compute Capability 6.1 2021-06-16 22:07:03.868119: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:03.868427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: name: NVIDIA GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.797 pciBusID: 0000:01:00.0 2021-06-16 22:07:03.868621: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2021-06-16 22:07:03.869537: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 2021-06-16 22:07:03.870339: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0 2021-06-16 22:07:03.870557: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0 2021-06-16 22:07:03.871640: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0 2021-06-16 22:07:03.872446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0 2021-06-16 22:07:03.875056: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2021-06-16 22:07:03.875163: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:03.875540: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:03.875825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0 2021-06-16 22:07:03.875868: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2021-06-16 22:07:03.876405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-06-16 22:07:03.876430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0 2021-06-16 22:07:03.876435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N 2021-06-16 22:07:03.876515: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:03.876831: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:03.877132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:0 with 7611 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1) True 2021-06-16 22:07:04.128308: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 WARNING:tensorflow:From:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-06-16 22:07:05.085985: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-16 22:07:05.086493: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-16 22:07:05.087099: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-06-16 22:07:05.125844: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-16 22:07:05.126506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.797GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
2021-06-16 22:07:05.126547: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-16 22:07:05.140285: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06-16 22:07:05.140365: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-06-16 22:07:05.142732: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-16 22:07:05.143184: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-16 22:07:05.146045: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-06-16 22:07:05.148365: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-06-16 22:07:05.148606: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-16 22:07:05.148751: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-16 22:07:05.149475: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-16 22:07:05.150075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-06-16 22:07:05.150119: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-16 22:07:05.518880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-16 22:07:05.518911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-06-16 22:07:05.518918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-06-16 22:07:05.519067: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-16 22:07:05.519449: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-16 22:07:05.519766: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-16 22:07:05.520066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 7424 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
True
2021-06-16 22:07:06.659267: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-06-16 22:07:06.683630: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3399905000 Hz
2021-06-16 22:07:06.684752: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x44ef1f0 executing computations on platform Host. Devices:
2021-06-16 22:07:06.684824: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2021-06-16 22:07:06.690851: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-06-16 22:07:06.774100: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-16 22:07:06.774496: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4578500 executing computations on platform CUDA. Devices:
2021-06-16 22:07:06.774513: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): NVIDIA GeForce GTX 1080, Compute Capability 6.1
2021-06-16 22:07:06.774626: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-16 22:07:06.774907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.797
pciBusID: 0000:01:00.0
2021-06-16 22:07:06.775077: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-06-16 22:07:06.775975: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-06-16 22:07:06.776779: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-06-16 22:07:06.776996: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-06-16 22:07:06.778054: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-06-16 22:07:06.778882: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-06-16 22:07:06.781527: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-06-16 22:07:06.781637: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-16 22:07:06.782008: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-16 22:07:06.782294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2021-06-16 22:07:06.782340: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-06-16 22:07:06.782879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-16 22:07:06.782904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2021-06-16 22:07:06.782910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2021-06-16 22:07:06.782991: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-16 22:07:06.783319: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-16 22:07:06.783620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 7611 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
True
```
(Not entirely sure whether we need all of
cuda-XY
or just individual parts likecuda-cudart-XY cuda-curand-XY cuda-cusolver-XY cuda-cusparse-XY cuda-cublas-XY cuda-cuffs-XY
though.)Thus, all we have to do is document this in the README (and maybe add rules to
deps-ubuntu
).For the Docker option, it's the same story: As long as we need to build a fat image accommodating all modules, we have to do the same as above within Docker. Until now, we chose the oldest base image
nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
forocrd/core-cuda
, because we usually needed the TF1 processors to have GPU access more than the TF2 processors. However, with the knowledge from above, we can work our way backwards from an image with the newest nvidia-driver, and install the older CUDA versions in there – via the same extended makefile rules.