b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.02k stars 68 forks source link

what(): Cannot create socket #87

Open Slaghton opened 1 month ago

Slaghton commented 1 month ago

I got local inference working but when I try to use workers I get this error.

dllama inference --model dllama_model_tinyllama_1_1b_3t_q40.m --tokenizer dllama_tokenizer_tinyllama_1_1b_3t_q40.t --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 4 --workers 192.168.0.1:9998

terminate called after throwing an instance of 'std::runtime_error' what(): Cannot create socket

My worker is currently happily running I believe on the same computer. Just to test if I would get any response from it, i did try having silly tavern connect to it which obviously failed but also crashed the worker so the port is definitely reachable.

C:\SWARM\distributed-llama>dllama worker --port 9998 --nthreads 4 Listening on 0.0.0.0:9998...

b4rtaz commented 1 month ago

@Slaghton could you pull the latest changes and try again?