Change run_exports from major.minor to major for CUDA>=10.1

isuruf commented 4 years ago

See https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-general-new-features

jakirkham commented 4 years ago

I'm not sure that is what that means.

Inside the cudatoolkit package, there are things like libcudart.so.10.2. If a library links against that (as cupy does, sorry Azure changed UI so will need to scroll), then it will be broken.

cc @kkraus14 @mike-wendt

jjhelmus commented 4 years ago

The relevant text from the release is:

Also in this release the soname of the libraries has been modified to not include the minor toolkit version number. For example, the cuFFT library soname has changed from libcufft.so.10.1 to libcufft.so.10. This is done to facilitate any future library updates that do not include API breaking changes without the need to relink.

My experience is that although the soname only include the major version, relinking is still needed when switching between minor versions.

leofang commented 4 years ago

My experience is that although the soname only include the major version, relinking is still needed when switching between minor versions.

This is my experience too. Please don’t do this before we can confirm NVIDIA stabilizes its versioning scheme. Think about the nuisance of 10.1 Update 0/1/2 just not long ago...

isuruf commented 4 years ago

My experience is that although the soname only include the major version, relinking is still needed when switching between minor versions.

I don't understand. Can you explain?

jjhelmus commented 4 years ago

If I install PyTorch and Tensorflow built with cudatoolkit 10.0, then remove cudatoolkit 10.0 and install 10.1 both fail to run test scripts:

# python gpu_test.py
2019-12-20 18:13:14.543411: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-12-20 18:13:14.571836: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-20 18:13:14.572530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.845
pciBusID: 0000:01:00.0
2019-12-20 18:13:14.572616: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2019-12-20 18:13:14.572666: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2019-12-20 18:13:14.572699: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2019-12-20 18:13:14.572730: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2019-12-20 18:13:14.572761: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2019-12-20 18:13:14.572797: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2019-12-20 18:13:14.574955: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-20 18:13:14.574967: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2019-12-20 18:13:14.575226: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-12-20 18:13:14.597185: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-12-20 18:13:14.599534: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5598694d25c0 executing computations on platform Host. Devices:
2019-12-20 18:13:14.599599: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2019-12-20 18:13:14.599790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-20 18:13:14.599836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]
2019-12-20 18:13:14.662175: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-20 18:13:14.662776: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x559869535230 executing computations on platform CUDA. Devices:
2019-12-20 18:13:14.662791: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1660 Ti, Compute Capability 7.5
Traceback (most recent call last):
  File "gpu_test.py", line 5, in <module>
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 227, in constant
    allow_broadcast=True)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 235, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/opt/conda/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device.

# conda activate pytorch
(pytorch) [root@chi9 io]# python pytorch_test.py
Traceback (most recent call last):
  File "pytorch_test.py", line 3, in <module>
    import torch
  File "/opt/conda/envs/pytorch/lib/python3.7/site-packages/torch/__init__.py", line 81, in <module>
    from torch._C import *
ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory

Both of these package are trying to dlopen the major.minor libraries. Perhaps it is possible for these projects to switch to using the major only library but this is not how they are currently setup.

isuruf commented 4 years ago

That doesn't work. This only works with 10.1 onwards as the link mentions. Try doing the same with 10.1 and 10.2

jjhelmus commented 4 years ago

Unfortunately, I do not have packages nor a machine configured to test 10.1 vs 10.2 at the moment.

jakirkham commented 4 years ago

That doesn't work. This only works with 10.1 onwards as the link mentions. Try doing the same with 10.1 and 10.2

@isuruf, as noted above it doesn't. libcudart includes the major and minor version in the SONAME.

isuruf commented 4 years ago

Ah, then we should split cudatoolkit into 2 packages so that CUDA packages built with 10.1 will get the benefits of 10.2 where applicable

jjhelmus commented 4 years ago

Examining the runtime docker images from docker hub it appears as if most of the libraries use a major only SONAME but three (libcudart.so, libnvrtc-builtins.so and libnvrtc.so.10.2) use major.minor.

These two groups could made into two different conda packages so that the compatible libraries can be installed into a 10.1 environment. The existing cudatoolkit packages will likely need to have a run_constrain added to avoid clobbering.

jakirkham commented 4 years ago

Ah, then we should split cudatoolkit into 2 packages so that CUDA packages built with 10.1 will get the benefits of 10.2 where applicable

That's an interesting idea. Could be reasonable. Have not personally explored this.

@kkraus14 @mike-wendt, do you have any thoughts on this idea?

kkraus14 commented 4 years ago

I'm not opposed to the idea of turning cudatoolkit into a metapackage and breaking it up, what would be the proposed split of packages?

jakirkham commented 4 years ago

IIUC it would be split along the lines of which libraries include the CUDA minor version (like .1 or .2) in their SONAME or not. Though I suppose it could be more granular than that. Does this sound correct to you @isuruf or did you have something else in mind?

leofang commented 4 years ago

Sorry for a stupid question: If we split cudatoolkit, what would happen when we check the runtime versions via cudaRuntimeGetVersion and individual libraries' API? Detecting versions at runtime correctly is important at least for CuPy afaik.

jakirkham commented 4 years ago

I think cudaRuntimeGetVersion comes from the CUDA Runtime API (libcudart). So that would still be tracking the patch version.

leofang commented 4 years ago

Thanks @jakirkham. So sounds like with the splitting we could get a 10.2 runtime but, say, a 10.1 cuFFT or cuRAND coexist.

Sorry I wasn't paying attention to @jjhelmus's original comment:

it appears as if most of the libraries use a major only SONAME but three (libcudart.so, libnvrtc-builtins.so and libnvrtc.so.10.2) use major.minor.

These two groups could made into two different conda packages so that the compatible libraries can be installed into a 10.1 environment. The existing cudatoolkit packages will likely need to have a run_constrain added to avoid clobbering.

So would this work for applications depending on NVRTC, built with 10.1, and running with 10.2? I don't see any guarantee of API/ABI compatibility mentioned in NVRTC's documentation, so if its SOs' names have major.minor, this is a bit worrying...

isuruf commented 4 years ago

So would this work for applications depending on NVRTC, built with 10.1, and running with 10.2? I don't see any guarantee of API/ABI compatibility mentioned in NVRTC's documentation, so if its SOs' names have major.minor, this is a bit worrying

Please read @jjhelmus's comment carefully. NVRTC (and CUDART) would be in the group of packages that pins to major.minor and the others would be in the group of packages that are pinned to major.

conda-forge / nvcc-feedstock

Change run_exports from major.minor to major for CUDA>=10.1 #35