Closed ms1design closed 4 months ago
Wow @ms1design , thank you for all this! Amazing 🙏
I won't have time to look into all these changes until after GTC, and honestly yes I do have concerns about the scope of changes and the massive rebuilds they will trigger across all the containers (and needing to revalidate all of them are still working)
I get what you were doing by merging most/all run commands into one layer, however sometimes I still prefer to keep them separate (albiet in as big of chunks as possible) for better debugging when they do fail to build. When that happens, I will start the half-built container during the intermediate build stage and figure out what needs to be run to fix it - then put those changes in the dockerfile when it's working.
Also for the big packages, I have been migrating to having separate "builder" vs "deploy" dockerfiles, where the deploy dockerfile copies the wheel out of the builder - but they are not stages built in the same dockerfile (this way not everyone needs to build the builder container, and can just use the builders that I have pushed to dockerhub). I think the more normal multi-stage way you are doing it here is a good intermediary step, however that way everyone still needs to build the wheels.
See the pytorch, MLC, and FAISS containers for some examples of that separate builder vs deploy dockerfile thing. I pick those pretty strategically which ones I do that for (basically if the wheel takes more than an hour to build on AGX, it is infeasible for Nano users to build)
Anyways, thanks again!! Will look into these more closely in a couple weeks.
Hey Dusty!
Thanks for fast feedback- can’t agree more with you! When you shared your view on wheels build strategy in separate dockerfile - it sounds really reasonable. I think thats not a big effort to refactor some of the changes in my PR to follow MLC as a example.
When it comes to reducing the number of layers - maybe I went too far with this contraption :) I’ll try to find some spare time in next week to adjust my PR following your suggestions.
Thanks again for your time and all suggestions!
Ironically, I began hitting docker max depth exceeded
limit when working on local_llm container 🤣
So in commit https://github.com/dusty-nv/jetson-containers/commit/b44b58e8925dc81399cf1f59e03f09d379d2aab8 reduced the number of layers in many of those dependencies (but not the ones in the stable-diffusion-webui / text-generation-webui chain - I will further integrate your changes for those after GTC timeframe, those are still building fine for now 🤞
I have also been simmering on this idea of having the builder containers automatically push their wheels to a custom pypi server that I would run, so anyone could just pip install them with --extra-index-url
I have also been simmering on this idea of having the builder containers automatically push their
wheels to a custom pypi server that I would run, so anyone could just pip install them with --extra-index-url
Definitely upvote this! Thanks
@dusty-nv Hey, thanks for feedback - I haven’t had any spare time yet as you can see to modify this PR.
That’s unexpected what you write- tbh I didn’t know that we have finite max number of docker layers :) The docker image size reduction was a minimal gain (around 700MB on text-generation-ui
) to the point that I was planing to revert it… :) But now I will hold my horses - for the other hand it reduces the number of layers by more than half.
I also like the idea of uploading wheels to pypi server - do you think we could also cover more private setup - like pushing wheel to any private pypi registry based on the URL passed as a env var?
I’m touching this because I use the fork of this repo in a CI/CD env where I can host private pypi registry. I’m quite interested in that - this would be huge improvement!
Back to you, @dusty-nv :)
I just updated a bit this PR:
cuda
container - proper Ubuntu pin version based on L4T version;dev
branch;build
stages (later I plan to do this in standalone Dockerfile.builder
taking mlc
container as an example;This PR is in the stage of transition to many, small cherry-picked PR's and will be closed at the end.
@dusty-nv separate PRs created for most of the containers. Please feel free after the GTC to test, comment or even modify my PRs. I'm closing this one.
Hi @dusty-nv đź‘‹
This PR aims to enhance the efficiency and clarity of
Dockerfiles
across multiple packages by reducing layer complexity, fixing formatting issues, and introducing standardized build and install stages where applicable and ensuring compatibility with newer environments (JP6
).Additionally, it introduces environmental variables where necessary for better containerization practices. Notable changes include the adoption of consistent practices for building pip wheels, the inclusion of necessary configurations for various packages, and the integration of
openai-triton
package as a dependency where appropriate.@dusty-nv please don't be scared by this PR size! 🙏 I would be very happy if any part of it would be cherry-picked to introduce some improvements to
jetson-containers
repository.Changes Summary
Dockerfiles
across packages to streamline the build process and improve readability.git
branches, removed unnecessary patches, and introduced new configuration options where required to optimize package functionality.Impact
Dockerfiles
improves containerization practices, making it easier to deploy and manage packages in various environments.stable-diffusion-webui
andtext-generation-webui
packages becomes more straightforward and modular.Detailed changes
build-essential
cmake_pip
cuda
https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/arm64/cuda-ubuntu2004.pin
withhttps://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/sbsa/cuda-ubuntu2204.pin
in favour ofJP6
&ubuntu 22.04
cudnn
pycuda
build
&install
stages inDockerfile
for buildingpip
wheelstable-diffusion-webui
install_sd_extensions.sh
script which can download & installstable-diffusion-webui
extensions during the build time fromDockerfile
stable-diffusion
auto_awq
build
&install
stages inDockerfile
for buildingpip
wheelauto_gptq
build
&install
stages inDockerfile
for buildingpip
wheelbitsandbytes
config.py
in order to fixJP6
/cuda 12.2
buildsbuild
&install
stages inDockerfile
for buildingpip
wheelexllama
build
&install
stages inDockerfile
for buildingpip
wheelexllama_v2
gptq-for-llama
config.py
in order to enableopenai-triton
based builds - newgptq-for-llama:triton
packagehuggingface_hub
llama_cpp
git
branch tomain
git patch
text-generation-webui
bitsandbytes
andopenai-triton
dependant packagestorch-grammar
,sentence-transformers
andflash-attention
(v1
&v2
) supportone_click
scriptsettings.json
. Instead we can usesettings.yaml
file based on this template mounted as a docker volume:transformers
xformers
pytorch
re-installationnumpy
onnx
onnxruntime
config.py
in order to enable package installation frompip
wheels
(from Jetson Zoo)openai-triton
package as dependencyopenai-triton
build
&install
stages inDockerfile
for buildingpip
wheelopencv
python
pytorch
torchvision
build
&install
stages inDockerfile
for buildingpip
wheelrust
tensorrt