abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.83k stars 934 forks source link

llama-cpp-python[server] using the wrong port and not accepting port arguments. #1359

Open JustinKunzi opened 5 months ago

JustinKunzi commented 5 months ago

Im toying around with the llama-cpp-python[server] and am running into an odd issue regarding the default port and the port argument when running it from the terminal through SSH into a Ubuntu Linux server. Im running on up-to date linux and i've updated llama-cpp-python the newest version currently available on the github releases. When running the server start command for the first time on a fresh reboot of the linux server with: python -m llama_cpp.server --model model.gguf --n_gpu_layers -1 --n_ctx 2048 it behaves as expected running smoothly and defaulting to using the port 8000 as in the documentation. However, when I terminate the process after I'm done using it and then come back to it the following day having done nothing else inbetween that time, including shutting down the server, and run the exact same command as before this time the server starts and states its running on localhost at port 5000 instead of 8000. Attempting to correct this behavior using the --port 8000 argument does not seem to affect it at all as it starts up stating its running at localhost port 5000. Whats odd is that I already have a service running on port 5000 just not on local host. I tried using the --host with the servers IP to see if it would still choose 5000 and to my surprise it booted just fine using the correct hosting IP and still using port 5000. I then tried to do both --host IP & --port 8000 together however it again ignores the --port argument but correctly uses the provided host IP just at port 5000. When the other service I run at 5000 is up and listening on 5000, llama-cpp-python[server] still attempts to launch using 5000 and errors out as the address is already in use by the prior mentioned service. I have run netstat -ano | grep 8000 and confirmed that no other service is currently using port 8000 before attempting these commands.

Expected Behavior

Llama-cpp-python[server] should use its default port of 8000 when launching with no argument to the port. It should also respond correctly to the --port using that inputed port when launched. python -m llama_cpp.server --model model.gguf --n_gpu_layers -1 --n_ctx 2048 Should launch at 127.0.0.1:8000

python -m llama_cpp.server --model model.gguf --n_gpu_layers -1 --n_ctx 2048 --port 1234 Should launch at 127.0.0.1:1234

Current Behavior

When provided with an argument for the port using --port with any number other than 5000 the argument is ignored and the server attempts to launch using 5000. Running the command including the --host argument it uses the correct host IP provided but does not attempt to change the port. When launching it recognizes that port 5000 is in use and shutdowns the application as expected. This behaviour occurs even when I have confirmed that no other processes are running on port 8000 prior to running the command.

Environment and Context

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz

NVIDIA Driver Version: 535.171.04 CUDA Version: 12.2 4x NVIDIA RTX A4500

Steps to Reproduce

Unfortunately these steps are about as detailed as I can be in this particular situation as this is exactly how I am able to reproduce the unwanted behaviour.

  1. Step 1 Through SSH run the llama-cpp-python[sever] using the command: python -m llama_cpp.server --model model.gguf --n_gpu_layers -1 --n_ctx 2048
  2. Step 2 Kill the server process close the terminal and disconnect from the SSH session. (Normally here is where I leave my office for the day.)
  3. Step 3 Return to the terminal through SSH and attempt to run the same command: python -m llama_cpp.server --model model.gguf --n_gpu_layers -1 --n_ctx 2048 Im my situation currently this command will instead boot at 5000 instead of the default 8000.

Failure Logs

There are no failure logs at any point and no error messages unless attempting to host at the same IP as my other service thats running at port 5000. This yields the correct error of address is already in use which is correct but not relevant to current stated problem. There is also no argument errors when using the --port argument.


What I want to know is why its suddenly defaulting to using port 5000, when from what I can tell that behaviour is not mentioned anywhere in the documentation. I would understand if this was a secondary default upon having port 8000 be in use but I would expect that it would attempt to use the default 8000 and just print an address in use error when running which could then be corrected with the--port argument. I'm also curious as to why the code seems to completely ignoring the --port argument in all attempts to use it.

JustinKunzi commented 5 months ago

Rooting through the code and finding possible areas in which the problem could lie I found something rather interesting. In llama-cpp-python/llama-cpp/server/__main__.py I found the uvicorn run line which looks normal to me. The port is set just like the host so it should function that way but just as a sanity check I manually commented out line 90: port=int(os.getenv("PORT", server_settings.port)), and replaced it with port=8000, and ran it again to find that this time it correctly launched using port 8000. I left line 90 commented out but added the print statement: print(int(os.getenv("PORT", server_settings.port))) which prints 5000 no matter what is entered into the --port argument. I realized it must be something to do with my os environment. I did not have any global exports named port. I had been running this command from within a subfolder (Where my gguf model file was being held.) of a larger project.

MainProject
 |
 +-- .env
 |    
 +-- Model-Folder
 |  |  
 |  +-- model.gguf

The folder outside that is a larger project which has its own .env file. Which has... PORT=5000. Culprit found... However I'm still rather confused. I was under the impression that a .env file affected the local project files contained in same file as it but obviously its affecting the subfolders too. The command to start the llama-cpp-python server was executed from the Model-Folder directory not from the MainProject directory. Im quite interested to know how it was somehow able to get the .env file from my outer project from running the comand in the subfolder. Im rather interested to know how this actually occured since the line getting the port seems to correctly be getting them from the server_settings class. If someone could elaborate on how this might have happend I would greatly appreciate that. Obviously this could be fixed by changing the .env variable name and changing that in places its mentioned in my outer project code but that is not the greatest solution as I feel there is a better solution to handle this situation that could occur to others as well unknowningly.