Open Hylosium opened 7 months ago
That's really weird - it's as though startup.sh
has been converted to Windows-style line endings. Could you do me a favour and post the output of file startup.sh
?
Then run dos2unix startup.sh
and file startup.sh
again (you might need to sudo apt install dos2unix
first, if you don't already have it).
Output of startup.sh
is startup.sh: ASCII text, with CRLF line terminators
.
As I have a dual boot for Windows and Ubuntu I download your repository with Windows and maybe that was the cause of the problem.
I re-cloned the repository again from a fresh Ubuntu 22.04 and the output is startup.sh: ASCII text
. Now it seems fine so I rebuiled the Docker Image and I thought this time was going to work but maybe I do have some problems with the paths? Or maybe I don't understand how fastchat paths work because when I run the following command I get the same output always:
docker run --device /dev/dri -v /media/$USER/DOSTerabytes/software/LLMs/Models:/root/.cache/huggingface -v /media/$USER/DOSTerabytes/software/LLMs/fastchat/ipex-llm-fastchat-docker/logs:/logs \
-p 7860:7860 -p 8000:8000 ipex-llm-fastchat-docker:latest \
--model-path mistralai/Mistral-7B-Instruct-v0.2
I get the following output and the last lines are always the same ones:
:: WARNING: setvars.sh has already been run. Skipping re-execution.
To force a re-execution of setvars.sh, use the '--force' option.
Using '--force' can result in excessive use of your environment variables.
usage: source setvars.sh [--force] [--config=file] [--help] [...]
--force Force setvars.sh to re-run, doing so may overload environment.
--config=file Customize env vars using a setvars.sh configuration file.
--help Display this help message and exit.
... Additional args are passed to individual env/vars.sh scripts
and should follow this script's arguments.
Some POSIX shells do not accept command-line options. In that case, you can pass
command-line options via the SETVARS_ARGS environment variable. For example:
$ SETVARS_ARGS="ia32 --config=config.txt" ; export SETVARS_ARGS
$ . path/to/setvars.sh
The SETVARS_ARGS environment variable is cleared on exiting setvars.sh.
Model path: mistralai/Mistral-7B-Instruct-v0.2
Model name: Mistral-7B-Instruct-v0.2
Worker args: --model-path mistralai/Mistral-7B-Instruct-v0.2
Enable web: true
Enable OpenAI API: true
2024-04-30 01:29:56 | INFO | controller | args: Namespace(host='0.0.0.0', port=21001, dispatch_method='shortest_queue', ssl=False)
2024-04-30 01:29:56 | ERROR | stderr | INFO: Started server process [30]
2024-04-30 01:29:56 | ERROR | stderr | INFO: Waiting for application startup.
2024-04-30 01:29:56 | ERROR | stderr | INFO: Application startup complete.
2024-04-30 01:29:56 | ERROR | stderr | INFO: Uvicorn running on http://0.0.0.0:21001 (Press CTRL+C to quit)
/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
2024-04-30 01:29:59,827 - INFO - intel_extension_for_pytorch auto imported
found intel-openmp in /usr/local/lib/libiomp5.so
found jemalloc in /usr/local/lib/python3.11/dist-packages/ipex_llm/libs/libjemalloc.so
found oneapi in /opt/intel/oneapi/setvars.sh
:: WARNING: setvars.sh has already been run. Skipping re-execution.
To force a re-execution of setvars.sh, use the '--force' option.
Using '--force' can result in excessive use of your environment variables.
usage: source setvars.sh [--force] [--config=file] [--help] [...]
--force Force setvars.sh to re-run, doing so may overload environment.
--config=file Customize env vars using a setvars.sh configuration file.
--help Display this help message and exit.
... Additional args are passed to individual env/vars.sh scripts
and should follow this script's arguments.
Some POSIX shells do not accept command-line options. In that case, you can pass
command-line options via the SETVARS_ARGS environment variable. For example:
$ SETVARS_ARGS="ia32 --config=config.txt" ; export SETVARS_ARGS
$ . path/to/setvars.sh
The SETVARS_ARGS environment variable is cleared on exiting setvars.sh.
+++++ Env Variables +++++
LD_PRELOAD = /usr/local/lib/libiomp5.so /usr/local/lib/python3.11/dist-packages/ipex_llm/libs/libjemalloc.so
OMP_NUM_THREADS = 6
MALLOC_CONF = oversize_threshold:1,background_thread:false,metadata_thp:always,dirty_decay_ms:-1,muzzy_decay_ms:-1
USE_XETLA = OFF
ENABLE_SDP_FUSION = 1
SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = 1
+++++++++++++++++++++++++
Complete.
2024-04-30 01:30:01 | INFO | stdout | INFO: 127.0.0.1:59650 - "POST /refresh_all_workers HTTP/1.1" 200 OK
2024-04-30 01:30:01 | INFO | stdout | INFO: 127.0.0.1:59660 - "POST /list_models HTTP/1.1" 200 OK
2024-04-30 01:30:01 | INFO | stdout | INFO: 127.0.0.1:59670 - "POST /get_worker_address HTTP/1.1" 200 OK
/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
Waiting for model...
2024-04-30 01:30:03,800 - INFO - intel_extension_for_pytorch auto imported
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/serving/fastchat/vllm_worker.py", line 33, in <module>
from ipex_llm.vllm.engine import IPEXLLMAsyncLLMEngine as AsyncLLMEngine
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/engine/__init__.py", line 16, in <module>
from .engine import IPEXLLMAsyncLLMEngine, IPEXLLMLLMEngine, IPEXLLMClass
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/engine/engine.py", line 18, in <module>
from vllm.engine.llm_engine import LLMEngine
ModuleNotFoundError: No module named 'vllm'
2024-04-30 01:30:06 | INFO | stdout | INFO: 127.0.0.1:40914 - "POST /refresh_all_workers HTTP/1.1" 200 OK
2024-04-30 01:30:06 | INFO | stdout | INFO: 127.0.0.1:40928 - "POST /list_models HTTP/1.1" 200 OK
2024-04-30 01:30:06 | INFO | stdout | INFO: 127.0.0.1:40938 - "POST /get_worker_address HTTP/1.1" 200 OK
Waiting for model...
2024-04-30 01:30:11 | INFO | stdout | INFO: 127.0.0.1:40954 - "POST /refresh_all_workers HTTP/1.1" 200 OK
2024-04-30 01:30:11 | INFO | stdout | INFO: 127.0.0.1:40960 - "POST /list_models HTTP/1.1" 200 OK
2024-04-30 01:30:11 | INFO | stdout | INFO: 127.0.0.1:40964 - "POST /get_worker_address HTTP/1.1" 200 OK
Waiting for model...
Last lines are always Waiting for model
. I used git and git-lfs to download the model. The contents of my folders are the followings:
kenny@pc-ubuntu:/media/kenny/DOSTerabytes/software/LLMs/Models$ ls
hub meta-llama mistralai TheBloke
kenny@pc-ubuntu-ubuntu:/media/kenny/DOSTerabytes/software/LLMs/Models$ ls mistralai/Mistral-7B-Instruct-v0.2/
config.json model-00001-of-00003.safetensors model-00003-of-00003.safetensors pytorch_model-00001-of-00003.bin pytorch_model-00003-of-00003.bin README.md tokenizer_config.json tokenizer.model
generation_config.json model-00002-of-00003.safetensors model.safetensors.index.json pytorch_model-00002-of-00003.bin pytorch_model.bin.index.json special_tokens_map.json tokenizer.json
Should it be different? Or can you tell me how do you download the models and in which paths do you store them? By the way the disk /media/kenny/DOSTerabytes is formatted as NTFS because I use dual boot and go back to Windows for work, thanks again for you time.
@Hylosium - I'm seeing the exact same problem, it looks like Intel have broken something with their snapshot image updates. I'll investigate and get back to you.
@digitalscream Oh okey, I'll wait. Thank you so much :)
Hi, @Hylosium - it seems it's not just something broken, they've actually completely changed the structure. You might be better off having a look at the Docker images they provide (eg here: https://github.com/intel-analytics/ipex-llm/tree/main/docker/llm/serving/xpu/docker) to see if you can make it work.
I won't be updating this one, I'm afraid - Intel's break-everything approach to software development means I spend too much time fixing stuff and not enough actually using it. Also, they've ruined the performance with a recent update and broken the fan curves with recent firmware updates which make the cards unusable in Linux, so I don't really have a choice but to get rid of my Arc cards.
Oh, what a shame from Intel. Thanks for the information, I'll try to have a look up on it. Do you know any other projects or an easy way to put an Intel Arc to work on LLMs? I bought the Arc especially for this 😂
Thank you for your time, and after your reply I think we can close the issue.
Yeah, I bought two of them for this.
The easiest way, I think, is to run Ollama. If you look at the ipex_ollama
branch in this repo, you'll find a Docker image that I was experimenting with. It gets about 30 tokens/s with Llama 3, which isn't great (although it's still usable). It's certainly a lot less fiddly than using vLLM, and you can easily add open-webui as a second container to talk to it and provide access via a browser. Happy to keep this issue open if you need any help getting that up and running.
EDIT: For what it's worth, though, I decided to splash out and get a 7800 XT (I can't use Nvidia, because I'm on Linux for desktop and their drivers are also awful). Performance is in an entirely different league from the Arc - where the Arc was getting 30t/s, the 7800 XT hits 80-85t/s. It also pretty much just works - you have to use Ubuntu 22.04 as the Docker base image for the easiest way to run ROCm, but apart from that everything works out of the box.
Hi, thanks for your amazing work, I am running into a bug and I would appreciate it if you could help me.
I recently bough an Intel Arc A770 16GB and I'm triying to run this container. I'm using Ubuntu 24.04 as my OS, I tried to install the dependencies and drivers but it seems that above the linux kernel version 6.5 intel-i915-dkms is already avaidable so I couldn't install it because it was throwing me with errors.
I have problems when triying to run the container, this is what i run in the terminal:
And this is the output I'm getting:
Docker images show the correct image builded:
I've already tried with "mistralai/Mistral-7B-Instruct-v0.2", "TheBloke/mistral-7b-instruct-v0.2.Q6_K" and "llama3/Meta-Llama-3-8B-Instruct" but neither of those are working for me, is always the same error.
Should I try Ubuntu 22.04? If you could help me I would appreciate it