ggerganov / llama.cpp

LLM inference in C/C++
MIT License
65.58k stars 9.41k forks source link

Bug: Missing Port Binding in Docker Run Command #8419

Closed kasrahabib closed 1 month ago

kasrahabib commented 2 months ago

What happened?

Missing Port Binding in Docker Run Command

Description:

Hello,

I encountered an issue with the Docker run command provided in the docker.md file for the repository. The command as currently written causes a connection error due to the missing port binding between the Docker container and the host machine.

The command provided in the documentation:

docker run --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1

Running this command and sending a message using Huggingface Chat UI results in the following error:

[16:28:08.391] ERROR (569476): fetch failed
    err: {
      "type": "TypeError",
      "message": "fetch failed: connect ECONNREFUSED 127.0.0.1:8080",
      "stack:
          TypeError: fetch failed
              at Object.fetch (node:internal/deps/undici/undici:11730:11)
              at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
              at async eval (/home/username/chat-ui/src/lib/server/endpoints/llamacpp/endpointLlamacpp.ts:31:15)
              at async Module.generateFromDefaultEndpoint (/home/username/chat-ui/src/lib/server/generateFromDefaultEndpoint.ts:11:23)
              at async generateTitle (/home/username/chat-ui/src/lib/server/textGeneration/title.ts:54:10)
              at async Module.generateTitleForConversation (/home/username/chat-ui/src/lib/server/textGeneration/title.ts:17:19)
          caused by: Error: connect ECONNREFUSED 127.0.0.1:8080
              at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1595:16)

Proposed Fix:

To resolve this issue, the port binding should be added to the Docker run command. Below is the corrected command that works successfully:

sudo docker run --gpus all -v /home/username/.cache/llama.cpp:/models -p 8080:8080 local/llama.cpp:server-cuda -m /models/model_name.gguf --port 8080 --host 0.0.0.0 -n 512 --n-gpu-layers 1

The key addition is the -p 8080:8080 flag to bind port 8080 on the container to port 8080 on the host machine.

Request:

Please update the documentation to include the port binding in the Docker run command to help others avoid this error.

Thank you for your attention to this matter!

Name and Version

-

What operating system are you seeing the problem on?

Linux

Relevant log output

-
jaslatendresse commented 2 months ago

Similar problem here using the simplechat UI from the examples.

Dockerfile:

FROM ghcr.io/ggerganov/llama.cpp:server

COPY llama.cpp /home/llama.cpp
COPY simplechat-ui /app/simplechat-ui

RUN apt-get update && apt-get install -y gcc g++ make cmake libssl-dev libuv1-dev libmicrohttpd-dev build-essential ccache

WORKDIR /home/llama.cpp

RUN make

CMD ["./llama-server", "-p", "8080:8080", "-m", "/home/llama.cpp/models/model_name.gguf", "--path", "/home/simeplchat-ui", "--port", "8080", "--host", "127.0.0.1"]

Successful build (note that I added op's suggestion in the CMD, but it didn't fix it for me).

Run:

docker run ./llama-server -m /home/llama.cpp/models/model_name.gguf --path /home/simplechat-ui --host 127.0.0.1 --port 8080

Image and container are generated, logs indicate a successful run, but when navigating to 127.0.0.1:8080, there is a connection error.

Inference works fine when running without Docker. So there seems to be an issue with the port. I also tried adding ENV to my Dockerfile to bind the port and host but it was also unsuccessful.

I am on Macbook Pro M1 Pro Sonoma 14.4.1

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 14 days since being marked as stale.