NVIDIA / cccl

CUDA Core Compute Libraries
https://nvidia.github.io/cccl/
Other
1.17k stars 139 forks source link

[FEA]: SHA Stable release URLs #622

Open jsharpe opened 11 months ago

jsharpe commented 11 months ago

Is this a duplicate?

Area

Infrastructure

Is your feature request related to a problem? Please describe.

The source tarballs generated by github's release are not guaranteed to have stable shas over time and have been changed on at least 2 occasions historically. Build systems such as Bazel verify the sha of downloaded artifacts in order to guarantee that the sources haven't been tampered with. Changing sha's cause build breakages in such build systems.

Describe the solution you'd like

Upon release a github artifact of a snapshot of the code at the point of release is uploaded to the github releases page. This artifact is guaranteed to have a stable sha and so won't cause future build breakages when github change the version of git used to generate the source archives.

Describe alternatives you've considered

No response

Additional context

No response

github-actions[bot] commented 11 months ago

Hi @jsharpe!

Thanks for submitting this issue - the CCCL team has been notified and we'll get back to you as soon as we can! In the mean time, feel free to add any relevant information to this issue.

leofang commented 11 months ago

Jake can correct me but I doubt this is feasible within GitHub as you pointed out. If your goal is to do repackaging, perhaps you can parse the json file associated with each CUDA release, within which CCCL is included? For example:

    "cuda_cccl": {
        "name": "CXX Core Compute Libraries",
        "license": "CUDA Toolkit",
        "license_path": "cuda_cccl/LICENSE.txt",
        "version": "12.3.52",
        "linux-x86_64": {
            "relative_path": "cuda_cccl/linux-x86_64/cuda_cccl-linux-x86_64-12.3.52-archive.tar.xz",
            "sha256": "659f8f8fd58eb7f5bc8ba171712147a007a2c8c92f30b21d135cf2d12f80226d",
            "md5": "e97f283762e4cc26a91368b545445888",
            "size": "1148140"
        },
        "linux-ppc64le": {
            "relative_path": "cuda_cccl/linux-ppc64le/cuda_cccl-linux-ppc64le-12.3.52-archive.tar.xz",
            "sha256": "1a188bc279ba32d910259bdf9b7106accacba163ce9ef92989af18ca8a50a6ea",
            "md5": "a2c10d1037efe99ff2bd840a552500a0",
            "size": "1148584"
        },
        "linux-sbsa": {
            "relative_path": "cuda_cccl/linux-sbsa/cuda_cccl-linux-sbsa-12.3.52-archive.tar.xz",
            "sha256": "96b5465af73b77447c3997d923916f7aea0939ffd0d8be42bc197ee8d8965fca",
            "md5": "7b33cb50efd7d589bc1da8fd292179b6",
            "size": "1147616"
        },
        "windows-x86_64": {
            "relative_path": "cuda_cccl/windows-x86_64/cuda_cccl-windows-x86_64-12.3.52-archive.zip",
            "sha256": "074c26ca05bd305ee23fdc56c3e8115d21c7843c5778fb75052791d66ee73711",
            "md5": "ac1857b6c543c5489c126fb6a1ef0b10",
            "size": "3041533"
        }
    },

from https://developer.download.nvidia.com/compute/cuda/redist/redistrib_12.3.0.json.

leofang commented 11 months ago

Though it seems the json file does not give the CCCL version as shown on GitHub...

jrhemstad commented 11 months ago

I want to make sure I understand what we're talking about here.

@jsharpe, so you're saying that the SHA for the release artifacts (e.g., the tar/zip files) for a release like this https://github.com/NVIDIA/cccl/releases/tag/v2.2.0 can change after the release has already been posted?

Though it seems the json file does not give the CCCL version as shown on GitHub...

Yeah, it's a known issue that the cuda_cccl packages versioning doesn't match our actual library release version and instead just uses the corresponding CTK version. I'm still working on figuring out what to do about that.

jsharpe commented 11 months ago

I want to make sure I understand what we're talking about here.

@jsharpe, so you're saying that the SHA for the release artifacts (e.g., the tar/zip files) for a release like this v2.2.0 (release) can change after the release has already been posted?

Yes - see https://blog.bazel.build/2023/02/15/github-archive-checksum.html for bazel's postmortem from when this happened in Jan 2023. The reason why is because the source tar balls are generated on demand; github's scale prohibits it storing all the source archives for every release. However build artifacts explicitly uploaded and kept verbatim.

Though it seems the json file does not give the CCCL version as shown on GitHub...

Yeah, it's a known issue that the cuda_cccl packages versioning doesn't match our actual library release version and instead just uses the corresponding CTK version. I'm still working on figuring out what to do about that.

The NVIDIA mirrors for CCCL are definitely a valid alternative to fetching from github, again so long as these won't change over time. At this point we don't necessarily need automated tooling to find the version so manual finding of this is fine for now. Out of curiosity though - are the NVIDIA downloads containing binaries for the platforms supported? Just wondering why there are different files for each arch.

jrhemstad commented 11 months ago

Interesting! That makes sense now, thanks for explaining.

So you're saying that if we explicitly attach our own binaries to the release via the "Attach binaries here..." then those will always be stable because they aren't generated on demand?

latest-screenshot

are the NVIDIA downloads containing binaries for the platforms supported? Just wondering why there are different files for each arch.

Everything in CCCL is header-only, so there aren't any binaries. I'm guessing the team that produces the NVIDIA downloads packages (not us) just automatically creates arch specific files for every package they produce.

jsharpe commented 11 months ago

Yes that is correct; the bazel-contrib/rules-template provides a github action that automates this process on push of a tag if you're interested in automating this process.

jrhemstad commented 11 months ago

That sounds great. We can definitely do that.

Would you want the whole repo as a tar/zip? Or just the headers (i.e., the output from cmake install)?

jsharpe commented 11 months ago

We just need whatever is needed to consume from a user's perspective as we're not running tests or anything; so yes the output of cmake install would be fine and possibly preferable to keep bandwidth lower. In an ideal world it would also container bazel MODULE.bazel, WORKSPACE and BUILD.bazel files but we can patch those in via the bazel registry if you don't want to maintain those files in this repo, although I know that isaac makes use of bazel so I assume there exists at least some basic bazel build files for the CCCL components somewhere inside NVIDIA?

jrhemstad commented 11 months ago

In an ideal world it would also container bazel MODULE.bazel, WORKSPACE and BUILD.bazel files but we can patch those in via the bazel registry if you don't want to maintain those files in this repo

If you'd be willing to contribute and help maintain these, we'd be certainly willing to accept a PR with whatever additions are needed. None of us on our team have any Bazel experience, but we definitely want to make it easy for Bazel users to use CCCL.

jsharpe commented 11 months ago

Yes, I can likely do that. I'll get it working via the patch route first and then I can look to upstream those patches into this repo.

leofang commented 11 months ago

are the NVIDIA downloads containing binaries for the platforms supported? Just wondering why there are different files for each arch.

Everything in CCCL is header-only, so there aren't any binaries. I'm guessing the team that produces the NVIDIA downloads packages (not us) just automatically creates arch specific files for every package they produce.

Yes that's right, the internal automation is set up to generate packages for supported archs, but that pipeline has no notion of header-only libraries (CCCL is a unique existence) and can't make exceptions AFAIK.

leofang commented 11 months ago

Yes - see https://blog.bazel.build/2023/02/15/github-archive-checksum.html for bazel's postmortem from when this happened in Jan 2023.

Related: https://github.com/community/community/discussions/46034 (didn't know if anything has changed since the GitHub PM raised this discussion, though, I didn't follow)

wmaxey commented 11 months ago

In an ideal world it would also container bazel MODULE.bazel, WORKSPACE and BUILD.bazel files but we can patch those in via the bazel registry if you don't want to maintain those files in this repo

If you'd be willing to contribute and help maintain these, we'd be certainly willing to accept a PR with whatever additions are needed. None of us on our team have any Bazel experience, but we definitely want to make it easy for Bazel users to use CCCL.

We first need an action to install the headers into a tarball and attach that to a release. Then we can patch various build systems into it. Are there any others? CMake does it for us with our CMake install stuff, so we'd just need to manually copy and maintain bazel. I'd prefer if it was probably... configured by CMake? Would we want to ship the bazel bits with the CTK?

alliepiper commented 11 months ago

We can easily add new install rules to our cmake scripts to add the bazel files to the install tree. Manual copying shouldn't be needed.

AustinSchuh commented 10 months ago

@jsharpe , the timing is perfect here :) I just started using clang 17 with cccl and bazel to do CUDA development targeting a Orin NX. https://github.com/frc971/971-Robot-Code/commit/ae856ca9c0df9d5485db46bbd341322e652c15e4 is the change I had to apply if that helps you with the toolchain changes + build rules.

alliepiper commented 3 months ago

1945 implements release automation that will provide stable zips and tarballs of the CCCL install tree.