Closed kmittman closed 4 months ago
I think the cleanest fix is to update the schema ...
For example option A
with an array
"version": "8.9.1.23",
"linux-x86_64": [
{
"relative_path": "cudnn/linux-x86_64/cudnn-linux-x86_64-8.9.1.23_cuda11-archive.tar.xz",
"sha256": "a6d9887267e28590c9db95ce65cbe96a668df0352338b7d337e0532ded33485c",
"md5": "56a15f6a9b85b0be2f005a1e3715d506",
"size": "903887852"
},
{
"relative_path": "cudnn/linux-x86_64/cudnn-linux-x86_64-8.9.1.23_cuda12-archive.tar.xz",
"sha256": "35163c5c542be0c511738b27e25235193cbeedc5e0e006e44b1cdeaf1922e83e",
"md5": "fe41922f07a13da7b1593639adb0e32c",
"size": "903519652"
}
],
Or option B
with a key
"version": "8.9.1.23",
"linux-x86_64": {
"cuda11": {
"relative_path": "cudnn/linux-x86_64/cudnn-linux-x86_64-8.9.1.23_cuda11-archive.tar.xz",
"sha256": "a6d9887267e28590c9db95ce65cbe96a668df0352338b7d337e0532ded33485c",
"md5": "56a15f6a9b85b0be2f005a1e3715d506",
"size": "903887852"
},
"cuda12": {
"relative_path": "cudnn/linux-x86_64/cudnn-linux-x86_64-8.9.1.23_cuda12-archive.tar.xz",
"sha256": "35163c5c542be0c511738b27e25235193cbeedc5e0e006e44b1cdeaf1922e83e",
"md5": "fe41922f07a13da7b1593639adb0e32c",
"size": "903519652"
}
},
Unfortunately updating the schema will break existing scripts, such as the parse_redistrib.py
one included in this repo.
Another open question is whether is makes sense to retroactively apply this or only going forward.
Option A, if chosen, definitely needs a discriminator field, e.g.
"version": "8.9.1.23",
"linux-x86_64": [
{
"relative_path": "cudnn/linux-x86_64/cudnn-linux-x86_64-8.9.1.23_cuda11-archive.tar.xz",
"cuda": "11",
...
},
{
"relative_path": "cudnn/linux-x86_64/cudnn-linux-x86_64-8.9.1.23_cuda12-archive.tar.xz",
"cuda": "12",
...
}
],
Otherwise the official interface becomes "parse the URL to infer CUDA version". This applies to strings like "cuda12"
to some extent as well
Is there any possibility that cudnn may later impose even more complex constraints?
I have a slight preference over the following approach, by adding an additional top level field cuda_ver
(final name TBD) to indicate how many variants one should expect below:
"version": "8.9.1.23",
"cuda_ver": ["11", "12"], # not sure if minor versions should be allowed here, TBD
"linux-x86_64": {
"cuda11": {
"relative_path": "cudnn/linux-x86_64/cudnn-linux-x86_64-8.9.1.23_cuda11-archive.tar.xz",
...
},
"cuda12": {
"relative_path": "cudnn/linux-x86_64/cudnn-linux-x86_64-8.9.1.23_cuda12-archive.tar.xz",
...
}
then perhaps it is less important to pick between Option A & B. It's easier to parse this IMHO.
I haven't any objections to @leofang's suggestion. As far as nixpkgs is concerned, I think it's equivalent to option B: we have several package set instances, e.g. cudaPackages_11
and cudaPackages_12
, which choose a JSON manifest based on the cudatoolkit semver. We already select cudnn releases based on the cudatoolkit minor version, just using some pretty clumsy logic and compatibility tables that are maintained manually (copied over from the cudnn release notes).
With the proposed change I think we'd try to just pick a manifest attribute by name, be that "${cudaMajorVersion}" == "12"
or "cuda${cudaMajorVersion}" == "cuda12"
, likely without looking at "cuda_ver"
Questions about "cuda_ver"
I maybe do have is whether the order of the list is significant and, generally, what is the implied contract for the field.
On a related note, @kmittman what range of compatibility guarantees are "cuda11"
/"cuda12"
keys meant to suggest? The cudnn manual/release notes seem to only make promises about minor versions, not entire major versions. Should the manifests maybe also include explicit compatibility metadata? I'd be happy to delete https://github.com/NixOS/nixpkgs/blob/8f7c43426a2dc5dac9d8aaa4f616c6002ded891d/pkgs/development/libraries/science/math/cudnn/releases.nix if this was an option
Both of you have made really good points, thank you @SomeoneSerge and @leofang very much! Need some time to ponder about the best option in general for: CMake, Conda, Nixpkgs, Debian, RPM, etc.
Regarding #2 I think I could inject min/max into the template, though TBH an accurate range would be difficult to maintain, each tarball file is tagged with some key-value metadata at creation time, then later parsed to generate the JSON manifests.
As far what that would look like? Here are some proposals
"cuda": { "min": "11.2.0", "max": "11.8.0" },
"cuda": { "min": "12.0", "max": "12.9999" },
"cuda": { "ge": "12.0.0", "lt": "13" },
"minCudaVersion": "11.0", "maxCudaVersion": "11.8"
"cuda_min": "11", "cuda_max": "11"
"depends": { "cuda": "11" },
Combining some of the current suggestions, wondering about something like
{
"release_date": "2023-05-05",
"release_label": "8.9.1",
"cudnn": {
"name": "NVIDIA CUDA Deep Neural Network library",
"license": "cudnn",
"license_path": "cudnn/LICENSE.txt",
"version": "8.9.1.23",
"cuda_ver": [
"11",
"12"
],
"linux-x86_64": {
"11": {
"cuda_min": "11.2",
"cuda_max": "11.8",
"relative_path": "cudnn/linux-x86_64/cudnn-linux-x86_64-8.9.1.23_cuda11-archive.tar.xz",
"sha256": "a6d9887267e28590c9db95ce65cbe96a668df0352338b7d337e0532ded33485c",
"md5": "56a15f6a9b85b0be2f005a1e3715d506",
"size": "903887852"
},
"12": {
"cuda_min": "12",
"cuda_max": "12",
"relative_path": "cudnn/linux-x86_64/cudnn-linux-x86_64-8.9.1.23_cuda12-archive.tar.xz",
"sha256": "35163c5c542be0c511738b27e25235193cbeedc5e0e006e44b1cdeaf1922e83e",
"md5": "fe41922f07a13da7b1593639adb0e32c",
"size": "903519652"
}
}
}
}
Just chiming in, I would absolutely love if the manifest included compatible CUDA ranges — saves me from needing to maintain them elsewhere as Serge pointed out.
Is there anything I can do to assist?
I'm planning to start implementation work on this. Have a few open questions @ConnorBaker , @SomeoneSerge , @leofang
The v3 schema is a breaking change, for existing v2 manifests a. Leave them alone b. Update them in-place c. Update them with another filename
Any other feedback about the min/max CUDA version? For RPM/Debian, we use deps based on libcudart.so.$cudaMajor
Once I have something working, I'll post a generated sample JSON manifest and work on updating the Python example in this repo
Okay, here's what I've got redistrib_1.2.3.json
{
"release_date": "2023-06-20",
"release_label": "1.2.3",
"release_product": "placeholder",
"libplaceholder": {
"name": "NVIDIA Placeholder",
"license": "custom",
"license_path": "libplaceholder/LICENSE.txt",
"version": "1.2.3.4",
"linux-x86_64": {
"cuda12": {
"relative_path": "libplaceholder/linux-x86_64/libplaceholder-linux-x86_64-1.2.3.4_cuda12-archive.tar.xz",
"sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"md5": "d41d8cd98f00b204e9800998ecf8427e",
"size": "1156992"
},
"cuda11": {
"relative_path": "libplaceholder/linux-x86_64/libplaceholder-linux-x86_64-1.2.3.4_cuda11-archive.tar.xz",
"sha256": "01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b",
"md5": "68b329da9893e34099c7d8ad5cb9c940",
"size": "1126204"
}
},
"cuda_variant": [
"12",
"11"
]
}
}
and non-variant manifest remains mostly intact redistrib_0.1.0.json
{
"release_date": "2023-06-20",
"release_label": "0.1.0",
"release_product": "foobar",
"libfoobar": {
"name": "NVIDIA Foo Bar",
"license": "custom",
"license_path": "libfoobar/LICENSE.txt",
"version": "0.1.0.9",
"linux-x86_64": {
"relative_path": "libfoobar/linux-x86_64/libfoobar-linux-x86_64-0.1.0.9-archive.tar.xz",
"sha256": "36a9e7f1c95b82ffb99743e0c5c4ce95d83c9a430aac59f84ef3cbfab6145068",
"md5": "7215ee9c7d9dc229d2921a40e899ec5f",
"size": "1743028"
}
}
}
For example: cuDNN redistrib_8.9.1.23 has both CUDA 11.x and CUDA 12.x tarballs.
The JSON manifest includes references to
but is missing
Related to bin-archive v3 format item