multiple Tensorflow / CUDA versions again

bertsky commented 3 years ago

To my knowledge, despite our efforts to work around the Tensorflow dependency hell (each TF version being tied closely to a narrow range of CUDA / Python / Numpy versions, and in turn CUDA being dependent on certain libcudnn / nvidia-driver), we have not yet tackled the problem of providing GPU access to multiple OCR-D processors relying on different TF versions at the same time yet.

However, for native installations, the solution is not far away: Since Nvidia put the version numbers into all the package names, it is in principle possible to install multiple versions of CUDA runtime and cuDNN at the same time – as long as they all can agree on a suitable nvidia-driver (which is usually the newest; luckily, this one appears to be largely backwards compatible). The problem is that TF loads the libcudart dynamically and to that end, needs the right version in the dynamic linker/loader's search path. But the CUDA packages seem to only activate the last installed CUDA toolkit in ld.so.conf. This is easily fixed, however:

# get them all
apt install cuda-10-0 cuda-10-1 cuda-10-2 cuda-11-0 cuda-11-1 libcudnn7 libcudnn8

# /etc/ld.so.conf.d/cuda.conf:
/usr/local/cuda-10.0/lib64
/usr/local/cuda-10.0/targets/x86_64-linux/lib
/usr/local/cuda-10.1/lib64
/usr/local/cuda-10.1/targets/x86_64-linux/lib
/usr/local/cuda-10.2/lib64
/usr/local/cuda-10.2/targets/x86_64-linux/lib
/usr/local/cuda-11.0/lib64
/usr/local/cuda-11.0/targets/x86_64-linux/lib
/usr/local/cuda-11.1/lib64
/usr/local/cuda-11.1/targets/x86_64-linux/lib

This _does_ work: for venv in venv/local/sub-venv/headless-tf*; do . $venv/bin/activate && python -c "import tensorflow as tf; print(tf.test.is_gpu_available())"; done

``` 2021-06-16 22:07:03.732233: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2021-06-16 22:07:03.755631: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3399905000 Hz 2021-06-16 22:07:03.756510: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4769990 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2021-06-16 22:07:03.756579: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2021-06-16 22:07:03.766686: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2021-06-16 22:07:03.864881: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:03.867961: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x47f8450 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2021-06-16 22:07:03.867978: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce GTX 1080, Compute Capability 6.1 2021-06-16 22:07:03.868119: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:03.868427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: name: NVIDIA GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.797 pciBusID: 0000:01:00.0 2021-06-16 22:07:03.868621: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2021-06-16 22:07:03.869537: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 2021-06-16 22:07:03.870339: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0 2021-06-16 22:07:03.870557: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0 2021-06-16 22:07:03.871640: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0 2021-06-16 22:07:03.872446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0 2021-06-16 22:07:03.875056: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2021-06-16 22:07:03.875163: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:03.875540: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:03.875825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0 2021-06-16 22:07:03.875868: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2021-06-16 22:07:03.876405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-06-16 22:07:03.876430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0 2021-06-16 22:07:03.876435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N 2021-06-16 22:07:03.876515: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:03.876831: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:03.877132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:0 with 7611 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1) True 2021-06-16 22:07:04.128308: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 WARNING:tensorflow:From :1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead. 2021-06-16 22:07:05.085985: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-06-16 22:07:05.086493: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-06-16 22:07:05.087099: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-06-16 22:07:05.125844: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:05.126506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1080 computeCapability: 6.1 coreClock: 1.797GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s 2021-06-16 22:07:05.126547: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-06-16 22:07:05.140285: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2021-06-16 22:07:05.140365: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2021-06-16 22:07:05.142732: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-06-16 22:07:05.143184: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-06-16 22:07:05.146045: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-06-16 22:07:05.148365: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2021-06-16 22:07:05.148606: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2021-06-16 22:07:05.148751: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:05.149475: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:05.150075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-06-16 22:07:05.150119: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-06-16 22:07:05.518880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-06-16 22:07:05.518911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 2021-06-16 22:07:05.518918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N 2021-06-16 22:07:05.519067: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:05.519449: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:05.519766: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:05.520066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 7424 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1) True 2021-06-16 22:07:06.659267: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2021-06-16 22:07:06.683630: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3399905000 Hz 2021-06-16 22:07:06.684752: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x44ef1f0 executing computations on platform Host. Devices: 2021-06-16 22:07:06.684824: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version 2021-06-16 22:07:06.690851: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2021-06-16 22:07:06.774100: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:06.774496: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4578500 executing computations on platform CUDA. Devices: 2021-06-16 22:07:06.774513: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): NVIDIA GeForce GTX 1080, Compute Capability 6.1 2021-06-16 22:07:06.774626: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:06.774907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: NVIDIA GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.797 pciBusID: 0000:01:00.0 2021-06-16 22:07:06.775077: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2021-06-16 22:07:06.775975: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 2021-06-16 22:07:06.776779: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0 2021-06-16 22:07:06.776996: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0 2021-06-16 22:07:06.778054: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0 2021-06-16 22:07:06.778882: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0 2021-06-16 22:07:06.781527: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2021-06-16 22:07:06.781637: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:06.782008: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:06.782294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2021-06-16 22:07:06.782340: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2021-06-16 22:07:06.782879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-06-16 22:07:06.782904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 2021-06-16 22:07:06.782910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N 2021-06-16 22:07:06.782991: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:06.783319: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-06-16 22:07:06.783620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 7611 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1) True ```

(Not entirely sure whether we need all of cuda-XY or just individual parts like cuda-cudart-XY cuda-curand-XY cuda-cusolver-XY cuda-cusparse-XY cuda-cublas-XY cuda-cuffs-XY though.)

Thus, all we have to do is document this in the README (and maybe add rules to deps-ubuntu).

For the Docker option, it's the same story: As long as we need to build a fat image accommodating all modules, we have to do the same as above within Docker. Until now, we chose the oldest base image nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04 for ocrd/core-cuda, because we usually needed the TF1 processors to have GPU access more than the TF2 processors. However, with the knowledge from above, we can work our way backwards from an image with the newest nvidia-driver, and install the older CUDA versions in there – via the same extended makefile rules.

kba commented 3 years ago

Thanks for bringing this up. So the fix you're proposing is to add a CUDA configuration to /etc/ld.so.conf.d with all the versions of the DLL? That is refreshingly simple.

So, yes, we should document this in the README (and possibly provide a sudo make link-cuda command).

And to solve this for the Docker images, we'd base the ocrd/core-cuda image on the latest release and install the older CUDA versions into it?

If I got this right, then let's implement this.

bertsky commented 3 years ago

Thanks for bringing this up. So the fix you're proposing is to add a CUDA configuration to /etc/ld.so.conf.d with all the versions of the DLL? That is refreshingly simple.

Yes, it would seem so.

So, yes, we should document this in the README (and possibly provide a sudo make link-cuda command).

Embarrassingly, (to my knowledge) we do not provide any documentation on CUDA/GPU setup yet. Not here and not in the setup guide. Besides the information above we would first need to explain which CUDA version is needed for which processors. According to the official compatibility matrix and the experiments conducted by @mikegerber, we'll need up to 11.2 for newest TF 2.5 (which gets dragged in by ocrd_pc_segmentation, and currently also ocrd_calamari) and down to 10.0 for TF 1.15 (still needed by most TF processors).

Not all native installations will already have the required driver required for the newest CUDA toolkit. The official requirement for 11.3.1 seems to be nvidia-driver-450 (i.e. 450.80.02) on Linux, and version 456.38 on Windows. So users might choose not to maximally upgrade their system if the processors they need are already supported by a lower CUDA version and driver. (We should document the above table alongside a list of our current processor requirements.)

But in my view, since we are targetting Ubuntu, I think we should go further than just providing some sudo make cuda-ldconfig knob: we could provide some top-level sudo make cuda-ubuntu which

registers the Nvidia apt repos and key signature (but doing that fully automatically might violated their terms; they want users to accept their EULA by downloading these)
does some form of apt-get install cuda-runtime-10-0 cuda-runtime-10-1 cuda-runtime-10-2 cuda-runtime-11-0 cuda-runtime-11-1 cuda-runtime-11-2 cuda-11-3
finally trigger the cuda-ldconfig knob

(Also, I am not sure about the paths; CUDA toolkit packages for other systems might use different paths for the libraries. So it's probably all Ubuntu specific anyway.)

And to solve this for the Docker images, we'd base the ocrd/core-cuda image on the latest release and install the older CUDA versions into it?

Yes – also with the above knob. (So for the ocrd_all native we need make cuda-ldconfig here, but for ocrd/all and ocrd/core-cuda we need it in core's makefile / Dockerfile / CircleCI cfg.

bertsky commented 3 years ago

It turns out that there's still a problem with this: images like nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu18.04 contain a package cuda-toolkit-config-common, which has a file /etc/ld.so.conf.d/000_cuda.conf that conflicts with other cudart packages (although no package conflict is registered in the metadata, and we don't need this config file anyway with our cuda-ldconfig rule). This is the error I am seeing:

dpkg: error processing archive /var/cache/apt/archives/cuda-cudart-11-2_11.2.152-1_amd64.deb (--unpack):
 trying to overwrite '/etc/ld.so.conf.d/000_cuda.conf', which is also in package cuda-toolkit-config-common 11.4.43-1

If I force the installation of the deb with dpkg -i --force-all cuda-cudart-11-2_11.2.152-1_amd64.deb then all runs smoothly. But I don't quite know how to script this yet.

bertsky commented 3 years ago

It turns out that there's still a problem with this: images like nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu18.04 contain a package cuda-toolkit-config-common, which has a file /etc/ld.so.conf.d/000_cuda.conf that conflicts with other cudart packages (although no package conflict is registered in the metadata, and we don't need this config file anyway with our cuda-ldconfig rule). This is the error I am seeing:
dpkg: error processing archive /var/cache/apt/archives/cuda-cudart-11-2_11.2.152-1_amd64.deb (--unpack):
 trying to overwrite '/etc/ld.so.conf.d/000_cuda.conf', which is also in package cuda-toolkit-config-common 11.4.43-1
If I force the installation of the deb with dpkg -i --force-all cuda-cudart-11-2_11.2.152-1_amd64.deb then all runs smoothly. But I don't quite know how to script this yet.

Update: Further investigation reveals that this is a specific problem of cuda-cudart-11-2 (or in turn cuda-runtime-11-2), which AFAICT we don't strictly need, as long as we have the newer and older versions.

But the new core-cuda image is huge: 12 GB instead of 1 GB.

mikegerber commented 3 years ago

Here's a list of compatible PyPI-TensorFlow versions and CUDA Toolkit versions: https://github.com/mikegerber/test-nvidia#results

I'm not testing TF2.1 there, is this version needed for ocropy and anybaseocr?

bertsky commented 2 years ago

I'm not testing TF2.1 there, is this version needed for ocropy and anybaseocr?

As #289 shows, we can make ocrd_anybaseocr work with newer TF, so this problem will soon be gone.

The above ideas have been merged in #270 already, the only problem currently is that the most recent Docker prebuild of the maximum-cuda variant did not complete – which will be solved by #287.

So when everything falls in place, I think we can close #279. But this one we can already close.

OCR-D / ocrd_all

multiple Tensorflow / CUDA versions again #263