ggerganov / llama.cpp

LLM inference in C/C++
MIT License
64.97k stars 9.32k forks source link

llama cpp server cant open to public #6268

Open Kev1ntan opened 5 months ago

Kev1ntan commented 5 months ago

Darwin Feedloops-Mac-Studio.local 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:55:06 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6020 arm64

example my public ip is: http://36.54.42.112

step to reproduce:

  1. python -m http.server --bind 0.0.0.0 8082, can be access from localhost:8082 and http://36.54.42.112:8082
  2. ./server -m ../models/mistral-7b-openorca.Q8_0.gguf -c 2048 --host 0.0.0.0 --port 8082 -ngl 33 -cb -np 32 can be access from localhost:8082/v1/models but cant access from http://36.54.42.112:8082/v1/models

any insight?, thank you.

phymbert commented 5 months ago

HI,

please verify the network family it listens to, ipv4 or ipv6 ?

We had the issue on the server test. Probably need to add a flag to select ipv4 only.

https://github.com/ggerganov/llama.cpp/blob/ddf65685105a39a57b1e7f80c3aa502a6313af24/examples/server/tests/features/steps/steps.py#L145

phymbert commented 5 months ago

See:

phymbert commented 5 months ago

We usually do not expose directly the server to internet, I am using docker or kubernetes and the container has only one socket familty to listen to. Feel free to open a PR to configure the good socket flags.