Closed shiqimei closed 8 months ago
@shiqimei this looks like an important change, should we reopen the PR?
Hi @zoq . Thanks for noticing this PR!
This script is problematic because it fails to properly manage the transfer of build outputs between stages. Feel free to fix it if you prefer to this approach.
Another solution I'd recommend now is:
Relocating the installation of platform-agnostic dependencies—such as git clones, curl downloads, pip packages, apt packages, and Hugging Face models like collabora/whisperspeech and charactr/vocos-encodec-24khz— to the base image.
Consequently, users will only need to compile tensorrt_llm for their CUDA devices during the Docker build process. While this method will significantly increase the base image's size, it will render the build process far more stable by minimizing dependence on various network resources to ghcr.io
and local cmake builds
instead only, which are substantially more reliable. My experience included 7-8 failures due to various download errors, necessitating the re-download of everything—resulting in over 300GB of redundant data—before successful completion.
Additional points to consider:
Even a minor non-zero exit can derail the entire build process, forcing a restart from the beginning. This not only increases the time required to achieve a successful build but also exacerbates the issues related to various network dependencies and the risk of download failures.
One of the key advantages of this approach is that users can develop applications based on whisperfusion using this base image without having to manage dependencies themselves. This enables developers to concentrate solely on the whisperfusion codebase, significantly streamlining the development process.
The proposed changes aim to mitigate common issues encountered during the build process, specifically those arising from apt or pip installation failures. By pre-installing these dependencies in the base image, we can avoid these pitfalls and ensure a smoother, more reliable build process.
Key Takeaways:
Introduction of Multi-Stage Builds: Breaks the Dockerfile into distinct stages for dependencies, TensorRT and LLM installation, and Whisperfusion setup.
Efficiency and Caching: By leveraging Docker's caching, we minimize rebuild times. If a stage fails, subsequent builds resume from the last successful stage, not from scratch.
Reduced Build Time and Bandwidth Use: This approach significantly cuts down on the time and bandwidth needed for repeated builds, speeding up development and deployment cycles.
Focused Stages for Easier Maintenance: Separating the build process into stages improves readability and maintainability of the Dockerfile, allowing for easier updates and optimizations.