coreweave / ml-containers

MIT License
19 stars 3 forks source link

feat(torch): Nightly PyTorch Builds #37

Closed Eta0 closed 1 year ago

Eta0 commented 1 year ago

Nightly PyTorch Builds

This change adds experimental container builds:

These builds are documented after torch and torch-extras in the README index.

Versioning

Structural Changes

This was originally set up so that torch-nightly.yml would call torch-base.yml and torch-nccl.yml as reusable workflows to inherit their configurations, but that ran up against the GitHub Actions reusable workflow call depth limit of three due to the chain torch-nightly.yml > torch-____.yml > torch.yml > build.yml. This setup flattens the call chain enough to not hit the limit, but to still share a source of configuration information with the two other workflows to stay in sync.

Build Process Improvements

Published Images

Some container images are already published to ml-containers/pkgs/container/ml-containers%2Fnightly-torch. Some nightly-torch-extras container images were published under ml-containers/pkgs/container/ml-containers%2Ftorch-extras instead of their own container name because of a bug that has been fixed, and new builds are set to use the name nightly-torch-extras. The cron schedule to build all of these will start once this is merged to main.