NVIDIA / cccl

CUDA Core Compute Libraries
Other
948 stars 119 forks source link

[FEA]: Setup nightly CI #1619

Open alliepiper opened 2 months ago

alliepiper commented 2 months ago

Is this a duplicate?

Area

Infrastructure

Is your feature request related to a problem? Please describe.

We would like to add more expensive CI tests (e.g. #1507, #1618), but these are not feasible to run per-pr. If we had a nightly CI running on main, we could add these more expensive checks.

Describe the solution you'd like

### Tasks
- [x] Add a new GHA job that runs nightly and does a specialized build/test cycle.
- [x] Update make_devcontainers.sh
- [x] Remove skipped jobs from UI
- [x] Update cudax infra PR (#1485)
- [ ] https://github.com/NVIDIA/cccl/issues/1700
- [ ] Figure out how to notify team of errors in non-PR workflows ([RAPIDS docs for slack notifs](https://docs.rapids.ai/resources/github-actions/#subscribing-to-nightlies))
- [x] Add coverage for more Thrust host.device configs (currently just cpp.cuda is tested)
- [x] Add `test-cpu` / `test-gpu` job types (for thrust device=<CPU backend> coverage)
- [ ] Don't explode GPUs in matrix and enable build -> test(gpu1), test(gpu2), ... dispatch with `sm:'all'` concatenation.
- [ ] https://github.com/NVIDIA/cccl/issues/1618
- [ ] Add job that builds with -fsanitize (https://github.com/NVIDIA/cccl/issues/1645)
- [ ] Update NVBench
- [ ] Test CCCL Infra on windows
- [ ] https://github.com/NVIDIA/cccl/issues/1682
- [ ] Test with minimum cmake
- [x] Tag images without os and remove matrix.yaml lookup table ([infra to edit](https://github.com/rapidsai/devcontainers/blob/branch-24.06/.github/workflows/build-test-and-push-linux-image.yml#L140-L141))
- [ ] Test without fp16 ops/convs in a nightly job ([ref](https://github.com/NVIDIA/cccl/pull/1785))

Additional context

jrhemstad commented 2 months ago

Some of the high level pieces I've envisioned for how this would work: