digitalscream / ipex-llm-fastchat-docker

Docker image providing fastchat (webui and api) for Intel Arc GPUs
Apache License 2.0
4 stars 2 forks source link

Error when running Docker container #3

Open Hylosium opened 5 months ago

Hylosium commented 5 months ago

Hi, thanks for your amazing work, I am running into a bug and I would appreciate it if you could help me.

I recently bough an Intel Arc A770 16GB and I'm triying to run this container. I'm using Ubuntu 24.04 as my OS, I tried to install the dependencies and drivers but it seems that above the linux kernel version 6.5 intel-i915-dkms is already avaidable so I couldn't install it because it was throwing me with errors.

I have problems when triying to run the container, this is what i run in the terminal:

docker run --device /dev/dri -v /media/$USER/DOSTerabytes/software/LLMs/fastchat:/root/.cache/huggingface -v /media/$USER/DOSTerabytes/software/LLMs/fastchat/logs:/logs \
  -p 7860:7860 -p 8000:8000 ipex-llm-fastchat-docker:latest \
  --model-path /media/$USER/DOSTerabytes/software/LLMs/Models/Mistral-7B-Instruct-v0.2

And this is the output I'm getting:

/bin/start_fastchat.sh: line 1: $'\r': command not found
/bin/start_fastchat.sh: line 3: $'\r': command not found
: No such file or directorye 4: /opt/intel/oneapi/setvars.sh
/bin/start_fastchat.sh: line 5: $'\r': command not found
/bin/start_fastchat.sh: line 9: $'\r': command not found
/bin/start_fastchat.sh: line 12: $'\r': command not found
Model path: /media/kenny/DOSTerabytes/software/LLMs/Models/Mistral-7B-Instruct-v0.2
Model name: media/kenny/DOSTerabytes/software/LLMs/Models/Mistral-7B-Instruct-v0.2
Worker args: --model-path /media/kenny/DOSTerabytes/software/LLMs/Models/Mistral-7B-Instruct-v0.2
Enable web: true
Enable OpenAI API: true
/bin/start_fastchat.sh: line 19: $'\r': command not found
/bin/start_fastchat.sh: line 21: $'\r': command not found
/bin/start_fastchat.sh: line 22: $'\r': command not found
/bin/start_fastchat.sh: illegal option -- 
Invalid Option: -

Usage: source ipex-llm-init [-o] [--option]

ipex-llm-init is a tool to automatically configure and run the subcommand under
environment variables for accelerating IPEX-LLM.

Optional options:
    -h, --help                Display this help message and exit.
    -o, --gomp                Disable intel-openmp and use default openmp (i.e. gomp)
    -j, --jemalloc            Use jemalloc as allocator
    -t, --tcmalloc            Use tcmalloc as allocator
    -c, --disable-allocator   Use the system default allocator
    -g, --gpu                 Enable OneAPI and other settings for GPU support
    -d, --debug               Print all internal and exported variables (for debug)
/bin/start_fastchat.sh: line 24: $'\r': command not found
/bin/start_fastchat.sh: line 27: $'\r': command not found
/bin/start_fastchat.sh: line 30: $'\r': command not found
/bin/start_fastchat.sh: line 33: $'\r': command not found
/bin/start_fastchat.sh: line 59: syntax error: unexpected end of file

Docker images show the correct image builded:

REPOSITORY                 TAG       IMAGE ID       CREATED             SIZE
ipex-llm-fastchat-docker   latest    11c76ec65eb4   About an hour ago   20.5GB

I've already tried with "mistralai/Mistral-7B-Instruct-v0.2", "TheBloke/mistral-7b-instruct-v0.2.Q6_K" and "llama3/Meta-Llama-3-8B-Instruct" but neither of those are working for me, is always the same error.

Should I try Ubuntu 22.04? If you could help me I would appreciate it

digitalscream commented 5 months ago

That's really weird - it's as though startup.sh has been converted to Windows-style line endings. Could you do me a favour and post the output of file startup.sh?

Then run dos2unix startup.sh and file startup.sh again (you might need to sudo apt install dos2unix first, if you don't already have it).

Hylosium commented 5 months ago

Output of startup.sh is startup.sh: ASCII text, with CRLF line terminators. As I have a dual boot for Windows and Ubuntu I download your repository with Windows and maybe that was the cause of the problem.

I re-cloned the repository again from a fresh Ubuntu 22.04 and the output is startup.sh: ASCII text. Now it seems fine so I rebuiled the Docker Image and I thought this time was going to work but maybe I do have some problems with the paths? Or maybe I don't understand how fastchat paths work because when I run the following command I get the same output always:

docker run --device /dev/dri -v /media/$USER/DOSTerabytes/software/LLMs/Models:/root/.cache/huggingface -v /media/$USER/DOSTerabytes/software/LLMs/fastchat/ipex-llm-fastchat-docker/logs:/logs \
  -p 7860:7860 -p 8000:8000 ipex-llm-fastchat-docker:latest \
  --model-path mistralai/Mistral-7B-Instruct-v0.2

I get the following output and the last lines are always the same ones:

:: WARNING: setvars.sh has already been run. Skipping re-execution.
   To force a re-execution of setvars.sh, use the '--force' option.
   Using '--force' can result in excessive use of your environment variables.

usage: source setvars.sh [--force] [--config=file] [--help] [...]
  --force        Force setvars.sh to re-run, doing so may overload environment.
  --config=file  Customize env vars using a setvars.sh configuration file.
  --help         Display this help message and exit.
  ...            Additional args are passed to individual env/vars.sh scripts
                 and should follow this script's arguments.

  Some POSIX shells do not accept command-line options. In that case, you can pass
  command-line options via the SETVARS_ARGS environment variable. For example:

  $ SETVARS_ARGS="ia32 --config=config.txt" ; export SETVARS_ARGS
  $ . path/to/setvars.sh

  The SETVARS_ARGS environment variable is cleared on exiting setvars.sh.

Model path: mistralai/Mistral-7B-Instruct-v0.2
Model name: Mistral-7B-Instruct-v0.2
Worker args: --model-path mistralai/Mistral-7B-Instruct-v0.2
Enable web: true
Enable OpenAI API: true
2024-04-30 01:29:56 | INFO | controller | args: Namespace(host='0.0.0.0', port=21001, dispatch_method='shortest_queue', ssl=False)
2024-04-30 01:29:56 | ERROR | stderr | INFO:     Started server process [30]
2024-04-30 01:29:56 | ERROR | stderr | INFO:     Waiting for application startup.
2024-04-30 01:29:56 | ERROR | stderr | INFO:     Application startup complete.
2024-04-30 01:29:56 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:21001 (Press CTRL+C to quit)
/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-04-30 01:29:59,827 - INFO - intel_extension_for_pytorch auto imported
found intel-openmp in /usr/local/lib/libiomp5.so
found jemalloc in /usr/local/lib/python3.11/dist-packages/ipex_llm/libs/libjemalloc.so
found oneapi in /opt/intel/oneapi/setvars.sh

:: WARNING: setvars.sh has already been run. Skipping re-execution.
   To force a re-execution of setvars.sh, use the '--force' option.
   Using '--force' can result in excessive use of your environment variables.

usage: source setvars.sh [--force] [--config=file] [--help] [...]
  --force        Force setvars.sh to re-run, doing so may overload environment.
  --config=file  Customize env vars using a setvars.sh configuration file.
  --help         Display this help message and exit.
  ...            Additional args are passed to individual env/vars.sh scripts
                 and should follow this script's arguments.

  Some POSIX shells do not accept command-line options. In that case, you can pass
  command-line options via the SETVARS_ARGS environment variable. For example:

  $ SETVARS_ARGS="ia32 --config=config.txt" ; export SETVARS_ARGS
  $ . path/to/setvars.sh

  The SETVARS_ARGS environment variable is cleared on exiting setvars.sh.

+++++ Env Variables +++++
LD_PRELOAD            = /usr/local/lib/libiomp5.so /usr/local/lib/python3.11/dist-packages/ipex_llm/libs/libjemalloc.so
OMP_NUM_THREADS       = 6
MALLOC_CONF           = oversize_threshold:1,background_thread:false,metadata_thp:always,dirty_decay_ms:-1,muzzy_decay_ms:-1
USE_XETLA             = OFF
ENABLE_SDP_FUSION     = 1
SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = 1
+++++++++++++++++++++++++
Complete.
2024-04-30 01:30:01 | INFO | stdout | INFO:     127.0.0.1:59650 - "POST /refresh_all_workers HTTP/1.1" 200 OK
2024-04-30 01:30:01 | INFO | stdout | INFO:     127.0.0.1:59660 - "POST /list_models HTTP/1.1" 200 OK
2024-04-30 01:30:01 | INFO | stdout | INFO:     127.0.0.1:59670 - "POST /get_worker_address HTTP/1.1" 200 OK
/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
Waiting for model...
2024-04-30 01:30:03,800 - INFO - intel_extension_for_pytorch auto imported
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.11/dist-packages/ipex_llm/serving/fastchat/vllm_worker.py", line 33, in <module>
    from ipex_llm.vllm.engine import IPEXLLMAsyncLLMEngine as AsyncLLMEngine
  File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/engine/__init__.py", line 16, in <module>
    from .engine import IPEXLLMAsyncLLMEngine, IPEXLLMLLMEngine, IPEXLLMClass
  File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/engine/engine.py", line 18, in <module>
    from vllm.engine.llm_engine import LLMEngine
ModuleNotFoundError: No module named 'vllm'
2024-04-30 01:30:06 | INFO | stdout | INFO:     127.0.0.1:40914 - "POST /refresh_all_workers HTTP/1.1" 200 OK
2024-04-30 01:30:06 | INFO | stdout | INFO:     127.0.0.1:40928 - "POST /list_models HTTP/1.1" 200 OK
2024-04-30 01:30:06 | INFO | stdout | INFO:     127.0.0.1:40938 - "POST /get_worker_address HTTP/1.1" 200 OK
Waiting for model...
2024-04-30 01:30:11 | INFO | stdout | INFO:     127.0.0.1:40954 - "POST /refresh_all_workers HTTP/1.1" 200 OK
2024-04-30 01:30:11 | INFO | stdout | INFO:     127.0.0.1:40960 - "POST /list_models HTTP/1.1" 200 OK
2024-04-30 01:30:11 | INFO | stdout | INFO:     127.0.0.1:40964 - "POST /get_worker_address HTTP/1.1" 200 OK
Waiting for model...

Last lines are always Waiting for model. I used git and git-lfs to download the model. The contents of my folders are the followings:

kenny@pc-ubuntu:/media/kenny/DOSTerabytes/software/LLMs/Models$ ls
hub  meta-llama  mistralai  TheBloke
kenny@pc-ubuntu-ubuntu:/media/kenny/DOSTerabytes/software/LLMs/Models$ ls mistralai/Mistral-7B-Instruct-v0.2/
config.json             model-00001-of-00003.safetensors  model-00003-of-00003.safetensors  pytorch_model-00001-of-00003.bin  pytorch_model-00003-of-00003.bin  README.md                tokenizer_config.json  tokenizer.model
generation_config.json  model-00002-of-00003.safetensors  model.safetensors.index.json      pytorch_model-00002-of-00003.bin  pytorch_model.bin.index.json      special_tokens_map.json  tokenizer.json

Should it be different? Or can you tell me how do you download the models and in which paths do you store them? By the way the disk /media/kenny/DOSTerabytes is formatted as NTFS because I use dual boot and go back to Windows for work, thanks again for you time.

digitalscream commented 5 months ago

@Hylosium - I'm seeing the exact same problem, it looks like Intel have broken something with their snapshot image updates. I'll investigate and get back to you.

Hylosium commented 5 months ago

@digitalscream Oh okey, I'll wait. Thank you so much :)

digitalscream commented 4 months ago

Hi, @Hylosium - it seems it's not just something broken, they've actually completely changed the structure. You might be better off having a look at the Docker images they provide (eg here: https://github.com/intel-analytics/ipex-llm/tree/main/docker/llm/serving/xpu/docker) to see if you can make it work.

I won't be updating this one, I'm afraid - Intel's break-everything approach to software development means I spend too much time fixing stuff and not enough actually using it. Also, they've ruined the performance with a recent update and broken the fan curves with recent firmware updates which make the cards unusable in Linux, so I don't really have a choice but to get rid of my Arc cards.

Hylosium commented 4 months ago

Oh, what a shame from Intel. Thanks for the information, I'll try to have a look up on it. Do you know any other projects or an easy way to put an Intel Arc to work on LLMs? I bought the Arc especially for this 😂

Thank you for your time, and after your reply I think we can close the issue.

digitalscream commented 4 months ago

Yeah, I bought two of them for this.

The easiest way, I think, is to run Ollama. If you look at the ipex_ollama branch in this repo, you'll find a Docker image that I was experimenting with. It gets about 30 tokens/s with Llama 3, which isn't great (although it's still usable). It's certainly a lot less fiddly than using vLLM, and you can easily add open-webui as a second container to talk to it and provide access via a browser. Happy to keep this issue open if you need any help getting that up and running.

EDIT: For what it's worth, though, I decided to splash out and get a 7800 XT (I can't use Nvidia, because I'm on Linux for desktop and their drivers are also awful). Performance is in an entirely different league from the Arc - where the Arc was getting 30t/s, the 7800 XT hits 80-85t/s. It also pretty much just works - you have to use Ubuntu 22.04 as the Docker base image for the easiest way to run ROCm, but apart from that everything works out of the box.