Open shiloh92 opened 8 months ago
You got it! I just updated the app so that config.json has a "host" and "port" option at the top. Use the default "0.0.0.0" for the host to listen on all interfaces locally, and adjust the port as needed. I think if you want to only listen on localhost, change 0.0.0.0 to 127.0.0.1.
Thanks, I might still have some other port related issue - UDA error 2 at .....\vendor\llama.cpp\ggml-cuda.cu:6878: out of memory current device: 0 ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)) Processing rating.py with client 127.0.0.1:5001 HTTPConnectionPool(host='127.0.0.1', port=5001): Max retries exceeded with url: /process_request (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000175E3B3D090>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it')) Processing rating.py with client 127.0.0.1:5001
I tried to change the settings for dolphin-2.6-mistral-7b.Q5_K_M.gguf in the config to see if it would work, using settings that work well in LM Studio, but still runs out of memory. Presuming this is why the machine refuses the connection.
Thanks for the extra info. If you change this in config.json, it should at least run:
I'm using llama.cpp with GPU layer offloading, and the number of layers you offload is configurable to taste and based on the capacity of your graphics card.
I have it set up to offload 12 layers right now for a Q8_0 mistral 7b GGUF, but that just barely works on my 6GB GTX 1660 S. I recommend starting with 0 layers offloaded, and then slowly increment the layers up, keeping an eye on how much VRAM is used up as you go. I think you want to go up to about 90% VRAM usage to be safe.
Also, I went ahead and added code that detects a port conflict, just in case someone runs into that in future.
Is there a way to change the ports this uses? I looked into the files but didn't find where this is set to 5031.