Update Ubuntu version to 24.04

adamoutler commented 5 months ago

Ubuntu 22.04LTS comes with Python 3.10. 22.04LTS is replaced by 24.04LTS. 24.04LTS has Python 3.11. Ubuntu 22.04 with Python 3.10 is causing issues with LLM_Web_search.

LLM_Web_search | uber generated an exception: catch_warnings.__init__() got an unexpected keyword argument 'action'

According to LLM_Web_Search dev, LLM_Web_Search requires Python 3.11, and Python 3.10 will not work.

Update to 23.**+ is required for plugins which are used in Textgen webui

Atinoda commented 5 months ago

Hi @adamoutler - thank you for raising your issue. The error message does not look a python version issue to me, but I will take a note to check the compatibility of this plug-in.

adamoutler commented 5 months ago

Plugin dev identified as python issue and recommended Ubuntu 23+.

Atinoda commented 4 months ago

Thanks - I have now read the conversation you had with mamel16. Upgrading the python version requires a lot of testing on my end - this application is a leaning tower of machine learning dependencies - but I will roll it into some other major updates that I have in mind.

Then again, sometimes it just works right away! You are welcome to modify the Docker file's base image if you'd like to give it a go.

adamoutler commented 4 months ago

I understand. I attempted to do so myself already and many of the pip packages have been moved into Ubuntu package repositories. While I'm no stranger to dependency management, and I do like the stability from package managers over individual packaging, I'm positive there are differences that will hinder me further.

I may have some time to continue later. Even building this on a 20 thread processor took a very long time. Not sure what I can do to speed it up. Any recommendations on making a smaller build from your multi-state Dockerfile?

Atinoda commented 4 months ago

Glad that you're trying it out! Please feel welcome to share your progress and experience.

Make sure that you specify the target when you're building and then it will skip the sections that it doesn't need. I'd imagine you want nvidia-default.

Docker builds cache their steps by default and will only re-build if something changes. Bear this in mind when you're tweaking things - try to put test variations and experiments after the longer build steps.

The default image takes about five minutes to build on my 5950x - I would expect your times to be similar to that.

adamoutler commented 4 months ago

So it looks like we're basically required to let Ubuntu manage the virtual environment in 24.04.

####################
### BUILD IMAGES ###
####################

# COMMON
FROM ubuntu:24.04 AS app_base
# Pre-reqs
RUN apt-get update && apt-get install --no-install-recommends -y \
    git vim build-essential python3-dev python3-venv python3-pip\
    python3-virtualenv
# Instantiate venv and pre-activate
RUN virtualenv /venv
# Credit, Itamar Turner-Trauring: https://pythonspeed.com/articles/activate-virtualenv-dockerfile/
ENV VIRTUAL_ENV=/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

...

The only changes required were to update the image, and then use the apt-get package manager for the initial packages instead of pip. I'm 500s into the build on step 16/29. Got other stuff going on. Will check in later with results.

adamoutler commented 4 months ago

Looks like it's expecting 3.10 but getting 3.10.9?

 => ERROR [app_nvidia 2/2] RUN pip3 install -r /app/requirements.txt                                                                                25.0s
------
 > [app_nvidia 2/2] RUN pip3 install -r /app/requirements.txt:
6.047 Ignoring llama-cpp-python: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"' don't match your environment
6.047 Ignoring llama-cpp-python: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"' don't match your environment
6.048 Ignoring llama-cpp-python: markers 'platform_system == "Windows" and python_version == "3.11"' don't match your environment
6.048 Ignoring llama-cpp-python: markers 'platform_system == "Windows" and python_version == "3.10"' don't match your environment
6.049 Ignoring llama-cpp-python-cuda: markers 'platform_system == "Windows" and python_version == "3.11"' don't match your environment
6.049 Ignoring llama-cpp-python-cuda: markers 'platform_system == "Windows" and python_version == "3.10"' don't match your environment
6.049 Ignoring llama-cpp-python-cuda: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"' don't match your environment
6.050 Ignoring llama-cpp-python-cuda: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"' don't match your environment
6.050 Ignoring llama-cpp-python-cuda-tensorcores: markers 'platform_system == "Windows" and python_version == "3.11"' don't match your environment
6.050 Ignoring llama-cpp-python-cuda-tensorcores: markers 'platform_system == "Windows" and python_version == "3.10"' don't match your environment
6.051 Ignoring llama-cpp-python-cuda-tensorcores: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"' don't match your environment
6.051 Ignoring llama-cpp-python-cuda-tensorcores: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"' don't match your environment
6.051 Ignoring exllamav2: markers 'platform_system == "Windows" and python_version == "3.11"' don't match your environment
6.052 Ignoring exllamav2: markers 'platform_system == "Windows" and python_version == "3.10"' don't match your environment
6.052 Ignoring exllamav2: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"' don't match your environment
6.053 Ignoring exllamav2: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"' don't match your environment
6.053 Ignoring exllamav2: markers 'platform_system == "Linux" and platform_machine != "x86_64"' don't match your environment
6.053 Ignoring flash-attn: markers 'platform_system == "Windows" and python_version == "3.11"' don't match your environment
6.054 Ignoring flash-attn: markers 'platform_system == "Windows" and python_version == "3.10"' don't match your environment
6.054 Ignoring flash-attn: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"' don't match your environment
6.054 Ignoring flash-attn: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"' don't match your environment
6.804 Collecting accelerate==0.30.* (from -r /app/requirements.txt (line 1))
6.996   Downloading accelerate-0.30.1-py3-none-any.whl.metadata (18 kB)
8.758 Collecting aqlm==1.1.5 (from aqlm[cpu,gpu]==1.1.5->-r /app/requirements.txt (line 2))
8.801   Downloading aqlm-1.1.5-py3-none-any.whl.metadata (1.7 kB)
10.30 Collecting auto-gptq==0.7.1 (from -r /app/requirements.txt (line 3))
10.34   Downloading auto_gptq-0.7.1.tar.gz (126 kB)
10.45      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 126.1/126.1 kB 1.1 MB/s eta 0:00:00
10.57   Installing build dependencies: started
21.08   Installing build dependencies: finished with status 'done'
21.08   Getting requirements to build wheel: started
21.56   Getting requirements to build wheel: finished with status 'error'
21.57   error: subprocess-exited-with-error
21.57
21.57   × Getting requirements to build wheel did not run successfully.
21.57   │ exit code: 1
21.57   ╰─> [1 lines of output]
21.57       Building cuda extension requires PyTorch (>=1.13.0) being installed, please install PyTorch first: No module named 'torch'
21.57       [end of output]
21.57
21.57   note: This error originates from a subprocess, and is likely not a problem with pip.
21.57 error: subprocess-exited-with-error
21.57
21.57 × Getting requirements to build wheel did not run successfully.
21.57 │ exit code: 1
21.57 ╰─> See above for output.
21.57
21.57 note: This error originates from a subprocess, and is likely not a problem with pip.
------
Dockerfile:43
--------------------
  41 |         --index-url https://download.pytorch.org/whl/cu121
  42 |     # Install oobabooga/text-generation-webui
  43 | >>> RUN pip3 install -r /app/requirements.txt
  44 |
  45 |     # Extended
--------------------
ERROR: failed to solve: process "/bin/sh -c pip3 install -r /app/requirements.txt" did not complete successfully: exit code: 1
(base) root@shark-wrangler:~/text-generation-webui-docker# python --version
Python 3.10.9

Atinoda commented 4 months ago

Hi @adamoutler, thank you for trying the update and sharing your results - it's useful to know your experience with it. Seems like we did not get lucky with just changing the base image - it was worth a shot, though!

I'm not sure why the venv would need to be managed by Ubuntu... I would prefer to keep it via pip. I'll have a look at installing a different python version in a venv, and also consider using conda-forge environment instead. The first option is a smaller change, the second option is a larger rework.

adamoutler commented 4 months ago

The reason for apt install is when working directly on the OS, you must now either apt or pip ... --break-system-packages. It's cleaner to use the os system package to install venv, then hop into the venv and install whatever is required.

Atinoda commented 4 months ago

I'm not quite sure I follow - isn't that the same as the current approach in the Dockerfile?

adamoutler commented 4 months ago

No. pip is used to obtain the virtual environment before using it.

https://github.com/Atinoda/text-generation-webui-docker/blob/master/Dockerfile#L11

Atinoda commented 4 months ago

Thanks for clarifying - I can see the distinction now. Just using apt-get to install the virtualenv package, instead of pip. I'm planning to look at this one over the summer break, so it's helpful to have a head-start.

Seems like there will be a need to manage simultaneous 3.10 and 3.11 environments starting from v1.8 of textgen - it'll be interesting navigating that!

adamoutler commented 4 months ago

Is there any requirement to use 3.10? Seems like 3.11 is the way.

Actually I was also wondering; Is there any requirement to use Ubuntu? It might be cleaner to start with an Alpine container and build in exactly what is required. I'm considering doing this because

The alpine packages are significantly faster to install
The overhead/image sizes are smaller
The attack surface is decreased because no default services exist
Once you have python/pip I'm not really seeing any dependencies on the system. The one thing I'm really concerned about is how to install Nvidia drivers on Alpine.

Atinoda commented 4 months ago

The release notes mention that 3.10 is required for TensorRT-LLM - apparently, it's the new and fastest backend in the project. I haven't had time to test it yet though.

Regarding Ubuntu as a base, I used to have a nvidia machine learning image as the base - which produced even larger images - so Ubuntu was the slimming down effort at the time plus the benefits of generalisation. I'm a big fan of alpine based images for embedded systems and micro-services, and use them regularly there - and you raise valid points about their strengths.

Given that text-generation-webui is for running LLMs - which are absolute resource monsters - I don't feel that the image needs to be super minimal or lightweight. With the extra packages and built-in capabilities of Ubuntu, it also helps with things like installing the Nvidia drivers, as you mentioned - which can be a fiasco. I also feel it's a better basis to use for development (which this image supports with the local source build option), and is more accessible to Linux newcomers.

The build times can indeed be slow and painful for this image and its variants - and that was one of my main motivations for setting up the project and pushing the pre-built versions to docker hub. Textgen is a fast-moving and cutting-edge project rather than mature and stable software - slimming its deployment down and tuning dependencies when they might change tomorrow is making a rod for your own back. Rather the goal is to ease its deployment, increase accessibility, and get people up and running quickly - but still offer the option to get hands-on and build or tweak it yourself.

None of these things are a requirement as such - very few things in life are - but that's my current thinking and motivation. Spend more of my finite resources on developing features and less on minimising the image footprint.

Atinoda commented 3 months ago

I added TensorRT-LLM support to the latest model, but there's not much available for it at the moment. It's also limited to Nvidia hardware only. Therefore, I'm not going to consider its integration as a stumbling block in the move to python 3.11.

The next update will investigate making the shift to 3.11, checking if there's any other dependency issues or hidden gotchas! TensorRT-LLM will either be dropped or be held in a separate variant if the transition is successful.

polarathene commented 3 months ago

Below is only discouraging Alpine, which has already been decided to avoid AFAIK. I just wanted to add additional context as to why it'd be bad.

It might be cleaner to start with an Alpine container and build in exactly what is required.

Alpine has been known to have a variety of problems that are not fun to troubleshoot. DNS and glibc vs musl are common issues that can be encountered, along with memory leaks (and higher usage in general IIRC) and often slower performance (try building a rust project and notice it can be 2-3x slower).

Smaller size is often very minor if you build the image properly. I've made images with Fedora as a base that are only slightly larger than the equivalent Alpine image.

In regards to Python, you can find resources online sharing woes with Alpine specific to that. Runtime performance is one IIRC.

Make sure you properly test/benchmark such a switch before adopting such. It's easy to look at advice parroting online that suggests Alpine for small sizes and smaller attack surfaces, which is easy to observe and reason with, rather than the more project specific concerns and networking quirks you can run into that waste time troubleshooting the cause.

Image weight

Besides the bulk of the image weight is from this image doing the `COPY` into `/venv` at roughly 10GiB in size (_output truncated to the larger items_): ```console $ dua /venv/lib/python3.10/site-packages 206.89 MiB sudachidict_core 353.01 MiB exllamav2_ext.cpython-310-x86_64-linux-gnu.so 389.21 MiB bitsandbytes 418.73 MiB triton 587.74 MiB llama_cpp_cuda 633.33 MiB flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so 642.19 MiB llama_cpp_cuda_tensorcores 1.48 GiB torch 2.74 GiB nvidia 9.62 GiB total # Total image size (rounded up): $ du -shx --bytes --si / 11G ``` ```console $ dua /venv/lib/python3.10/site-packages/torch/lib 453.07 MiB libtorch_cpu.so 815.36 MiB libtorch_cuda.so 1.38 GiB total $ dua /venv/lib/python3.10/site-packages/nvidia 94.25 MiB curand 185.40 MiB cufft 185.51 MiB cusolver 209.34 MiB nccl 252.91 MiB cusparse 594.99 MiB cublas 1.10 GiB cudnn 2.74 GiB total ``` Some of it is the wider cuda support bundled into a single image, while a fair amount from bundling the other variety of supported options. So I guess there is not much that can be done about that when the image itself is meant to be for convenience. With `mistral.rs` or `llama_cpp` projects alone those can be considerably smaller with the nvidia cuda libs included, but then I guess you need a different frontend so there's the trade-off 😓 Alpine isn't likely to save much there presumably, unless these files are larger than they need to be due to how they were compiled.

Rant - non-root example

This is not unlike some advice to adopt non-root users for containers. Root in a container is not equivalent to root on the Docker host despite what some might think there is already constraints. Often the exploits that are referred to rely on conditions that are from misconfiguration, there is some valid reasoning to prefer non-root so that a user does not have to drop capabilities by default themselves. My issue with non-root being adopted for security reasons is when the project then works around the lack of needed capabilities that root would have provided (_or requires opt-in via config if non-default caps that would be capable of more damage_).. by granting an executable those capabilities, it defeats the purpose (_this is often done with `setcap` to enforce a kernel check, even if you wouldn't use a feature of the application that would require it, this approach prevents opt-out requiring you to reduce security_).

Atinoda commented 1 month ago

@polarathene - thank you for your interesting comment! It is a helpful perspective that reinforces my decision to stay away from alpine in this case.

Your commentary regarding the non-root vs root containers was good to read. I run a lot of services and there's a mix of the two - root tends to be much easier to set-up and manage. My feelings are that non-root maybe has fewer security foot guns for a casual user, but it also tends to have a lot more complexities in day-to-day operations. I had been considering refactoring this image as non-root but that particular task has been pushed even further down the list now! Besides, it's already complicated enough for people to get their hardware acceleration working properly.

Atinoda / text-generation-webui-docker

Update Ubuntu version to 24.04 #52