coreweave / ml-containers

MIT License
19 stars 3 forks source link

feat: PyTorch Extras Container #26

Closed Eta0 closed 1 year ago

Eta0 commented 1 year ago

torch-extras Container

This PR adds a new container named ml-containers/torch-extras, which is ml-containers/torch with supplementary libraries DeepSpeed and flash-attention. The code is originally based off of #21, but significantly more generalized and with the finetuner application-specific parts removed.

Rationale

DeepSpeed and flash-attention both require CUDA development tools to install properly. This complicates using them with anything but an nvidia/cuda:...-devel based image. Optionally including them with our ml-containers/torch containers allows for still-lightweight images that can use those powerful libraries without the full CUDA development toolkit. It also reduces compile time for downstream Dockerfiles, since flash-attention takes a long time to compile at whatever step it is included.

Structure

ml-containers/torch-extra is separated out as a separate container, unlike the tag-differentiated torch:base and torch:nccl flavours of the baseline torch image. These are simply layers on top of the torch:base and torch:nccl images, and are built as a second CI step immediately after either of those two are built. Since compatibility of DeepSpeed and flash-attention may lag behind PyTorch releases themselves, the secondary step to build these images can be temporarily disabled via flags in torch-base.yml and torch-nccl.yml until they become compatible.

I welcome comments and suggestions on this build process and structure, because it requires tradeoffs. It guarantees that the torch-extras containers are always built, whenever possible, on new torch image updates, but it makes it more difficult to build the torch-extras containers standalone, if desired.