Streamlined Dockerfiles and Improved Build Process of `text-generation-webui` & `stable-diffusion-webui` packages

ms1design commented 4 months ago

Hi @dusty-nv 👋

This PR aims to enhance the efficiency and clarity of Dockerfiles across multiple packages by reducing layer complexity, fixing formatting issues, and introducing standardized build and install stages where applicable and ensuring compatibility with newer environments (JP6).

Additionally, it introduces environmental variables where necessary for better containerization practices. Notable changes include the adoption of consistent practices for building pip wheels, the inclusion of necessary configurations for various packages, and the integration of openai-triton package as a dependency where appropriate.

@dusty-nv please don't be scared by this PR size! 🙏 I would be very happy if any part of it would be cherry-picked to introduce some improvements to jetson-containers repository.

Changes Summary

Reduced Layer Complexity: Simplified Dockerfiles across packages to streamline the build process and improve readability.
Fixed Formatting: Ensured consistent and clean formatting for better maintainability.
Introduced Build and Install Stages: Implemented build and install stages in Dockerfiles for building pip wheels, enhancing reproducibility and ease of use.
Environmental Variables: Added environmental variables where necessary to enhance containerization practices.
Updated Package Configurations: Updated configurations and dependencies for packages to align with the latest standards and requirements.
Standardized Extensions Installation: Introduced a script for downloading and installing extensions during build time for the stable-diffusion-webui package, improving modularity and ease of extension management.
Dependency Management: Added dependencies such as openai-triton where necessary to enable seamless integration and functionality.
Configuration Updates: Updated default git branches, removed unnecessary patches, and introduced new configuration options where required to optimize package functionality.

Impact

Improved Build Efficiency: By reducing layer complexity and introducing standardized build stages, the build process becomes more efficient and easier to maintain, not to mention obvious time & final docker image size savings.
Enhanced Containerization Practices: The inclusion of environmental variables and streamlined Dockerfiles improves containerization practices, making it easier to deploy and manage packages in various environments.
Simplified Extension Management: With the introduction of a script for extension installation, managing extensions for the stable-diffusion-webui and text-generation-webui packages becomes more straightforward and modular.
Updated Dependencies: By adding necessary dependencies and configurations, the packages are aligned with the latest standards and requirements, ensuring optimal functionality and compatibility.

Detailed changes

`build-essential`

Added environmental variables
Reduced number of layers
Fixed formatting

`cmake_pip`

Reduced number of layers
Fixed formatting

`cuda`

Added environmental variables
Reduced number of layers
Fixed formatting
replaced https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/arm64/cuda-ubuntu2004.pin with https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/sbsa/cuda-ubuntu2204.pin in favour of JP6 & ubuntu 22.04

`cudnn`

Reduced number of layers
Fixed formatting

`pycuda`

Introduced build & install stages in Dockerfile for building pip wheel
Reduced number of layers
Fixed formatting

`stable-diffusion-webui`

Added the install_sd_extensions.sh script which can download & install stable-diffusion-webui extensions during the build time from Dockerfile
Extended default installed extensions to:
Reduced number of layers
Fixed formatting

`stable-diffusion`

Reduced number of layers
Fixed formatting

`auto_awq`

Introduced build & install stages in Dockerfile for building pip wheel
Reduced number of layers
Fixed formatting

`auto_gptq`

Introduced build & install stages in Dockerfile for building pip wheel
Reduced number of layers
Fixed formatting

`bitsandbytes`

Added config.py in order to fix JP6/cuda 12.2 builds
Introduced build & install stages in Dockerfile for building pip wheel
Reduced number of layers
Fixed formatting

`exllama`

Introduced build & install stages in Dockerfile for building pip wheel
Reduced number of layers
Fixed formatting

`exllama_v2`

Reduced number of layers
Fixed formatting

`gptq-for-llama`

Added config.py in order to enable openai-triton based builds - new gptq-for-llama:triton package
Fixed tests logging
Reduced number of layers
Fixed formatting

`huggingface_hub`

Reduced number of layers
Fixed formatting

`llama_cpp`

Changed default git branch to main
Removed unecessary git patch
Reduced number of layers
Fixed formatting

`text-generation-webui`

Reduced number of layers
Fixed formatting
Added bitsandbytes and openai-triton dependant packages
Added torch-grammar, sentence-transformers and flash-attention (v1 & v2) support
Create symbolic links instead of files duplication
Introduced a new way of automatic extensions installation using native one_click script
Removed settings.json. Instead we can use settings.yaml file based on this template mounted as a docker volume:

-v /path/to/settings.yaml:/opt/text-generation-webui/settings.yaml

`transformers`

Reduced number of layers
Fixed formatting

`xformers`

Reduced number of layers
Fixed unecessary pytorch re-installation
Fixed formatting

`numpy`

Reduced number of layers
Fixed formatting

`onnx`

Reduced number of layers
Fixed formatting

`onnxruntime`

Added config.py in order to enable package installation from pip wheels (from Jetson Zoo)
Reduced number of layers
Fixed formatting
Added openai-triton package as dependency

`openai-triton`

Introduced build & install stages in Dockerfile for building pip wheel

`opencv`

Reduced number of layers
Fixed formatting

`python`

Added environmental variables
Reduced number of layers
Fixed formatting

`pytorch`

Reduced number of layers
Fixed formatting

`torchvision`

Introduced build & install stages in Dockerfile for building pip wheel
Reduced number of layers
Fixed formatting

`rust`

Reduced number of layers
Fixed formatting

`tensorrt`

Reduced number of layers
Fixed formatting

dusty-nv commented 4 months ago

Wow @ms1design , thank you for all this! Amazing 🙏

I won't have time to look into all these changes until after GTC, and honestly yes I do have concerns about the scope of changes and the massive rebuilds they will trigger across all the containers (and needing to revalidate all of them are still working)

I get what you were doing by merging most/all run commands into one layer, however sometimes I still prefer to keep them separate (albiet in as big of chunks as possible) for better debugging when they do fail to build. When that happens, I will start the half-built container during the intermediate build stage and figure out what needs to be run to fix it - then put those changes in the dockerfile when it's working.

Also for the big packages, I have been migrating to having separate "builder" vs "deploy" dockerfiles, where the deploy dockerfile copies the wheel out of the builder - but they are not stages built in the same dockerfile (this way not everyone needs to build the builder container, and can just use the builders that I have pushed to dockerhub). I think the more normal multi-stage way you are doing it here is a good intermediary step, however that way everyone still needs to build the wheels.

See the pytorch, MLC, and FAISS containers for some examples of that separate builder vs deploy dockerfile thing. I pick those pretty strategically which ones I do that for (basically if the wheel takes more than an hour to build on AGX, it is infeasible for Nano users to build)

Anyways, thanks again!! Will look into these more closely in a couple weeks.

ms1design commented 4 months ago

Hey Dusty!

Thanks for fast feedback- can’t agree more with you! When you shared your view on wheels build strategy in separate dockerfile - it sounds really reasonable. I think thats not a big effort to refactor some of the changes in my PR to follow MLC as a example.

When it comes to reducing the number of layers - maybe I went too far with this contraption :) I’ll try to find some spare time in next week to adjust my PR following your suggestions.

Thanks again for your time and all suggestions!

dusty-nv commented 4 months ago

Ironically, I began hitting docker max depth exceeded limit when working on local_llm container 🤣

So in commit https://github.com/dusty-nv/jetson-containers/commit/b44b58e8925dc81399cf1f59e03f09d379d2aab8 reduced the number of layers in many of those dependencies (but not the ones in the stable-diffusion-webui / text-generation-webui chain - I will further integrate your changes for those after GTC timeframe, those are still building fine for now 🤞

I have also been simmering on this idea of having the builder containers automatically push their wheels to a custom pypi server that I would run, so anyone could just pip install them with --extra-index-url

bryanhughes commented 4 months ago

I have also been simmering on this idea of having the builder containers automatically push their 
wheels to a custom pypi server that I would run, so anyone could just pip install them with --extra-index-url

Definitely upvote this! Thanks

ms1design commented 4 months ago

@dusty-nv Hey, thanks for feedback - I haven’t had any spare time yet as you can see to modify this PR.

That’s unexpected what you write- tbh I didn’t know that we have finite max number of docker layers :) The docker image size reduction was a minimal gain (around 700MB on text-generation-ui) to the point that I was planing to revert it… :) But now I will hold my horses - for the other hand it reduces the number of layers by more than half.

I also like the idea of uploading wheels to pypi server - do you think we could also cover more private setup - like pushing wheel to any private pypi registry based on the URL passed as a env var?

I’m touching this because I use the fork of this repo in a CI/CD env where I can host private pypi registry. I’m quite interested in that - this would be huge improvement!

ms1design commented 4 months ago

Back to you, @dusty-nv :)

I just updated a bit this PR:

added fix for cuda container - proper Ubuntu pin version based on L4T version;
merged your changes from dev branch;
removed introduced build stages (later I plan to do this in standalone Dockerfile.builder taking mlc container as an example;
small cleanup

ms1design commented 4 months ago

This PR is in the stage of transition to many, small cherry-picked PR's and will be closed at the end.

ms1design commented 4 months ago

@dusty-nv separate PRs created for most of the containers. Please feel free after the GTC to test, comment or even modify my PRs. I'm closing this one.

dusty-nv / jetson-containers