Open adamoutler opened 5 months ago
Hi @adamoutler - thank you for raising your issue. The error message does not look a python version issue to me, but I will take a note to check the compatibility of this plug-in.
Plugin dev identified as python issue and recommended Ubuntu 23+.
Thanks - I have now read the conversation you had with mamel16. Upgrading the python version requires a lot of testing on my end - this application is a leaning tower of machine learning dependencies - but I will roll it into some other major updates that I have in mind.
Then again, sometimes it just works right away! You are welcome to modify the Docker file's base image if you'd like to give it a go.
I understand. I attempted to do so myself already and many of the pip packages have been moved into Ubuntu package repositories. While I'm no stranger to dependency management, and I do like the stability from package managers over individual packaging, I'm positive there are differences that will hinder me further.
I may have some time to continue later. Even building this on a 20 thread processor took a very long time. Not sure what I can do to speed it up. Any recommendations on making a smaller build from your multi-state Dockerfile?
Glad that you're trying it out! Please feel welcome to share your progress and experience.
Make sure that you specify the target
when you're building and then it will skip the sections that it doesn't need. I'd imagine you want nvidia-default
.
Docker builds cache their steps by default and will only re-build if something changes. Bear this in mind when you're tweaking things - try to put test variations and experiments after the longer build steps.
The default image takes about five minutes to build on my 5950x - I would expect your times to be similar to that.
So it looks like we're basically required to let Ubuntu manage the virtual environment in 24.04.
####################
### BUILD IMAGES ###
####################
# COMMON
FROM ubuntu:24.04 AS app_base
# Pre-reqs
RUN apt-get update && apt-get install --no-install-recommends -y \
git vim build-essential python3-dev python3-venv python3-pip\
python3-virtualenv
# Instantiate venv and pre-activate
RUN virtualenv /venv
# Credit, Itamar Turner-Trauring: https://pythonspeed.com/articles/activate-virtualenv-dockerfile/
ENV VIRTUAL_ENV=/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
...
The only changes required were to update the image, and then use the apt-get
package manager for the initial packages instead of pip
. I'm 500s into the build on step 16/29. Got other stuff going on. Will check in later with results.
Looks like it's expecting 3.10 but getting 3.10.9?
=> ERROR [app_nvidia 2/2] RUN pip3 install -r /app/requirements.txt 25.0s
------
> [app_nvidia 2/2] RUN pip3 install -r /app/requirements.txt:
6.047 Ignoring llama-cpp-python: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"' don't match your environment
6.047 Ignoring llama-cpp-python: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"' don't match your environment
6.048 Ignoring llama-cpp-python: markers 'platform_system == "Windows" and python_version == "3.11"' don't match your environment
6.048 Ignoring llama-cpp-python: markers 'platform_system == "Windows" and python_version == "3.10"' don't match your environment
6.049 Ignoring llama-cpp-python-cuda: markers 'platform_system == "Windows" and python_version == "3.11"' don't match your environment
6.049 Ignoring llama-cpp-python-cuda: markers 'platform_system == "Windows" and python_version == "3.10"' don't match your environment
6.049 Ignoring llama-cpp-python-cuda: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"' don't match your environment
6.050 Ignoring llama-cpp-python-cuda: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"' don't match your environment
6.050 Ignoring llama-cpp-python-cuda-tensorcores: markers 'platform_system == "Windows" and python_version == "3.11"' don't match your environment
6.050 Ignoring llama-cpp-python-cuda-tensorcores: markers 'platform_system == "Windows" and python_version == "3.10"' don't match your environment
6.051 Ignoring llama-cpp-python-cuda-tensorcores: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"' don't match your environment
6.051 Ignoring llama-cpp-python-cuda-tensorcores: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"' don't match your environment
6.051 Ignoring exllamav2: markers 'platform_system == "Windows" and python_version == "3.11"' don't match your environment
6.052 Ignoring exllamav2: markers 'platform_system == "Windows" and python_version == "3.10"' don't match your environment
6.052 Ignoring exllamav2: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"' don't match your environment
6.053 Ignoring exllamav2: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"' don't match your environment
6.053 Ignoring exllamav2: markers 'platform_system == "Linux" and platform_machine != "x86_64"' don't match your environment
6.053 Ignoring flash-attn: markers 'platform_system == "Windows" and python_version == "3.11"' don't match your environment
6.054 Ignoring flash-attn: markers 'platform_system == "Windows" and python_version == "3.10"' don't match your environment
6.054 Ignoring flash-attn: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"' don't match your environment
6.054 Ignoring flash-attn: markers 'platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"' don't match your environment
6.804 Collecting accelerate==0.30.* (from -r /app/requirements.txt (line 1))
6.996 Downloading accelerate-0.30.1-py3-none-any.whl.metadata (18 kB)
8.758 Collecting aqlm==1.1.5 (from aqlm[cpu,gpu]==1.1.5->-r /app/requirements.txt (line 2))
8.801 Downloading aqlm-1.1.5-py3-none-any.whl.metadata (1.7 kB)
10.30 Collecting auto-gptq==0.7.1 (from -r /app/requirements.txt (line 3))
10.34 Downloading auto_gptq-0.7.1.tar.gz (126 kB)
10.45 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 126.1/126.1 kB 1.1 MB/s eta 0:00:00
10.57 Installing build dependencies: started
21.08 Installing build dependencies: finished with status 'done'
21.08 Getting requirements to build wheel: started
21.56 Getting requirements to build wheel: finished with status 'error'
21.57 error: subprocess-exited-with-error
21.57
21.57 × Getting requirements to build wheel did not run successfully.
21.57 │ exit code: 1
21.57 ╰─> [1 lines of output]
21.57 Building cuda extension requires PyTorch (>=1.13.0) being installed, please install PyTorch first: No module named 'torch'
21.57 [end of output]
21.57
21.57 note: This error originates from a subprocess, and is likely not a problem with pip.
21.57 error: subprocess-exited-with-error
21.57
21.57 × Getting requirements to build wheel did not run successfully.
21.57 │ exit code: 1
21.57 ╰─> See above for output.
21.57
21.57 note: This error originates from a subprocess, and is likely not a problem with pip.
------
Dockerfile:43
--------------------
41 | --index-url https://download.pytorch.org/whl/cu121
42 | # Install oobabooga/text-generation-webui
43 | >>> RUN pip3 install -r /app/requirements.txt
44 |
45 | # Extended
--------------------
ERROR: failed to solve: process "/bin/sh -c pip3 install -r /app/requirements.txt" did not complete successfully: exit code: 1
(base) root@shark-wrangler:~/text-generation-webui-docker# python --version
Python 3.10.9
Hi @adamoutler, thank you for trying the update and sharing your results - it's useful to know your experience with it. Seems like we did not get lucky with just changing the base image - it was worth a shot, though!
I'm not sure why the venv would need to be managed by Ubuntu... I would prefer to keep it via pip. I'll have a look at installing a different python version in a venv, and also consider using conda-forge environment instead. The first option is a smaller change, the second option is a larger rework.
The reason for apt install is when working directly on the OS, you must now either apt
or pip ... --break-system-packages
. It's cleaner to use the os system package to install venv, then hop into the venv and install whatever is required.
I'm not quite sure I follow - isn't that the same as the current approach in the Dockerfile?
No. pip
is used to obtain the virtual environment before using it.
https://github.com/Atinoda/text-generation-webui-docker/blob/master/Dockerfile#L11
Thanks for clarifying - I can see the distinction now. Just using apt-get
to install the virtualenv
package, instead of pip. I'm planning to look at this one over the summer break, so it's helpful to have a head-start.
Seems like there will be a need to manage simultaneous 3.10 and 3.11 environments starting from v1.8 of textgen - it'll be interesting navigating that!
Is there any requirement to use 3.10? Seems like 3.11 is the way.
Actually I was also wondering; Is there any requirement to use Ubuntu? It might be cleaner to start with an Alpine container and build in exactly what is required. I'm considering doing this because
The release notes mention that 3.10 is required for TensorRT-LLM - apparently, it's the new and fastest backend in the project. I haven't had time to test it yet though.
Regarding Ubuntu as a base, I used to have a nvidia machine learning image as the base - which produced even larger images - so Ubuntu was the slimming down effort at the time plus the benefits of generalisation. I'm a big fan of alpine based images for embedded systems and micro-services, and use them regularly there - and you raise valid points about their strengths.
Given that text-generation-webui
is for running LLMs - which are absolute resource monsters - I don't feel that the image needs to be super minimal or lightweight. With the extra packages and built-in capabilities of Ubuntu, it also helps with things like installing the Nvidia drivers, as you mentioned - which can be a fiasco. I also feel it's a better basis to use for development (which this image supports with the local source build option), and is more accessible to Linux newcomers.
The build times can indeed be slow and painful for this image and its variants - and that was one of my main motivations for setting up the project and pushing the pre-built versions to docker hub. Textgen is a fast-moving and cutting-edge project rather than mature and stable software - slimming its deployment down and tuning dependencies when they might change tomorrow is making a rod for your own back. Rather the goal is to ease its deployment, increase accessibility, and get people up and running quickly - but still offer the option to get hands-on and build or tweak it yourself.
None of these things are a requirement as such - very few things in life are - but that's my current thinking and motivation. Spend more of my finite resources on developing features and less on minimising the image footprint.
I added TensorRT-LLM support to the latest model, but there's not much available for it at the moment. It's also limited to Nvidia hardware only. Therefore, I'm not going to consider its integration as a stumbling block in the move to python 3.11.
The next update will investigate making the shift to 3.11, checking if there's any other dependency issues or hidden gotchas! TensorRT-LLM will either be dropped or be held in a separate variant if the transition is successful.
Below is only discouraging Alpine, which has already been decided to avoid AFAIK. I just wanted to add additional context as to why it'd be bad.
It might be cleaner to start with an Alpine container and build in exactly what is required.
Alpine has been known to have a variety of problems that are not fun to troubleshoot. DNS and glibc vs musl are common issues that can be encountered, along with memory leaks (and higher usage in general IIRC) and often slower performance (try building a rust project and notice it can be 2-3x slower).
Smaller size is often very minor if you build the image properly. I've made images with Fedora as a base that are only slightly larger than the equivalent Alpine image.
In regards to Python, you can find resources online sharing woes with Alpine specific to that. Runtime performance is one IIRC.
Make sure you properly test/benchmark such a switch before adopting such. It's easy to look at advice parroting online that suggests Alpine for small sizes and smaller attack surfaces, which is easy to observe and reason with, rather than the more project specific concerns and networking quirks you can run into that waste time troubleshooting the cause.
@polarathene - thank you for your interesting comment! It is a helpful perspective that reinforces my decision to stay away from alpine in this case.
Your commentary regarding the non-root vs root containers was good to read. I run a lot of services and there's a mix of the two - root tends to be much easier to set-up and manage. My feelings are that non-root maybe has fewer security foot guns for a casual user, but it also tends to have a lot more complexities in day-to-day operations. I had been considering refactoring this image as non-root but that particular task has been pushed even further down the list now! Besides, it's already complicated enough for people to get their hardware acceleration working properly.
Ubuntu 22.04LTS comes with Python 3.10. 22.04LTS is replaced by 24.04LTS. 24.04LTS has Python 3.11. Ubuntu 22.04 with Python 3.10 is causing issues with LLM_Web_search.
According to LLM_Web_Search dev, LLM_Web_Search requires Python 3.11, and Python 3.10 will not work.
Update to 23.**+ is required for plugins which are used in Textgen webui