coreweave / ml-containers

MIT License
19 stars 3 forks source link

build(vllm-tensorizer): Compile `vllm-flash-attn` from source #70

Closed Eta0 closed 2 months ago

Eta0 commented 2 months ago

Compile vllm-flash-attn from Source

vLLM replaced their usages of the regular flash-attn library with their own vllm-flash-attn fork in vllm-project/vllm#4686, which, as of right now, is fairly easy to compile. This change compiles it from source for compatibility with the ml-containers/torch base images.

This is necessary to enable updating our vllm-tensorizer images to include the newest versions of vLLM.