master and worker started but with problems

Hi, I have two VM servers (192.168.0.1 CPU 8 (2 socket 4 cores) RAM 32G - 192.168.0.2 (CPU 8 (2 socket 4 cores) RAM 32G)) were I've cloned your repo and downloaded llama2 model, then I've compiled as: make dllama

On 192.168.0.2 I launch: root@ollama-test-ai:/opt/distributed-llama# ./dllama worker --port 9998 --nthreads 4 Listening on 0.0.0.0:9998...

On 192.168.0.1 I use: root@ollama-test:/opt/distributed-llama# nice -n -20 ./dllama inference --model dllama_model_llama-2-7b_2.m --tokenizer dllama-llama2-tokenizer.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --workers 192.168.21.169:9998 💡 arch: llama 💡 hiddenAct: silu 💡 dim: 4096 💡 hiddenDim: 11008 💡 nLayers: 32 💡 nHeads: 32 💡 nKvHeads: 32 💡 vocabSize: 32000 💡 seqLen: 2048 💡 nSlices: 2 💡 ropeTheta: 10000.0 📄 bosId: 1 📄 eosId: 2 🕒 ropeCache: 16384 kB ⏩ Loaded 4142416 kB

After launching the job on 192.168.0.1 on server 192.168.0.2 I receive: root@ollama-test-ai:/opt/distributed-llama# ./dllama worker --port 9998 --nthreads 4 Listening on 0.0.0.0:9998... 💡 sliceIndex: 1 💡 nSlices: 2 🕒 ropeCache: 16384 kB ⏩ Received 55584 kB for block 0 (382000 kB/s) ⏩ Received 55584 kB for block 1 (362535 kB/s) ⏩ Received 55584 kB for block 2 (369598 kB/s) ⏩ Received 55584 kB for block 3 (384581 kB/s) ⏩ Received 55584 kB for block 4 (384581 kB/s) ⏩ Received 55584 kB for block 5 (376940 kB/s) ⏩ Received 55584 kB for block 6 (369598 kB/s) ⏩ Received 55584 kB for block 7 (392538 kB/s) ⏩ Received 55584 kB for block 8 (406557 kB/s) ⏩ Received 55584 kB for block 9 (400831 kB/s) ⏩ Received 55584 kB for block 10 (415460 kB/s) ⏩ Received 55584 kB for block 11 (415460 kB/s) ⏩ Received 55584 kB for block 12 (372013 kB/s) ⏩ Received 55584 kB for block 13 (409482 kB/s) ⏩ Received 55584 kB for block 14 (400831 kB/s) ⏩ Received 55584 kB for block 15 (403674 kB/s) ⏩ Received 55584 kB for block 16 (421615 kB/s) ⏩ Received 55584 kB for block 17 (409482 kB/s) ⏩ Received 55584 kB for block 18 (412449 kB/s) ⏩ Received 55584 kB for block 19 (415460 kB/s) ⏩ Received 55584 kB for block 20 (415460 kB/s) ⏩ Received 55584 kB for block 21 (412449 kB/s) ⏩ Received 55584 kB for block 22 (412449 kB/s) ⏩ Received 55584 kB for block 23 (412449 kB/s) ⏩ Received 55584 kB for block 24 (415460 kB/s) ⏩ Received 55584 kB for block 25 (409482 kB/s) ⏩ Received 55584 kB for block 26 (392538 kB/s) ⏩ Received 55584 kB for block 27 (412449 kB/s) ⏩ Received 55584 kB for block 28 (409482 kB/s) ⏩ Received 55584 kB for block 29 (406557 kB/s) ⏩ Received 55584 kB for block 30 (323398 kB/s) ⏩ Received 55584 kB for block 31 (317978 kB/s) terminate called after throwing an instance of 'ReadSocketException' what(): std::exception Aborted (core dumped)

And on server 192.168.0.1 I got: root@ollama-test:/opt/distributed-llama# nice -n -20 ./dllama inference --model dllama_model_llama-2-7b_2.m --tokenizer dllama-llama2-tokenizer.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --workers 192.168.21.169:9998 💡 arch: llama 💡 hiddenAct: silu 💡 dim: 4096 💡 hiddenDim: 11008 💡 nLayers: 32 💡 nHeads: 32 💡 nKvHeads: 32 💡 vocabSize: 32000 💡 seqLen: 2048 💡 nSlices: 2 💡 ropeTheta: 10000.0 📄 bosId: 1 📄 eosId: 2 🕒 ropeCache: 16384 kB ⏩ Loaded 4142416 kB dllama: src/apps/dllama/dllama.cpp:15: void generate(Inference, SocketPool, Tokenizer, Sampler, AppArgs, TransformerSpec): Assertion `args->prompt != NULL' failed. Aborted (core dumped)

Now my questions are: 1) Am I doing something wrong? 2) Do I have to expect to have a prompt?

Using chat instead of inference, the master 192.168.0.1 start to write things: root@ollama-test:/opt/distributed-llama# nice -n -20 ./dllama chat --model dllama_model_llama-2-7b_2.m --tokenizer dllama-llama2-tokenizer.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --workers 192.168.21.169:9998 💡 arch: llama 💡 hiddenAct: silu 💡 dim: 4096 💡 hiddenDim: 11008 💡 nLayers: 32 💡 nHeads: 32 💡 nKvHeads: 32 💡 vocabSize: 32000 💡 seqLen: 2048 💡 nSlices: 2 💡 ropeTheta: 10000.0 📄 bosId: 1 📄 eosId: 2 🕒 ropeCache: 16384 kB ⏩ Loaded 4142416 kB 💻 Enter system prompt (optional): 👱 User: ubuntu 🤖 Assistant: ksam and devops Hello. I am using kubernetes with Kubespray. I am deploying 3 services and using kubernetes for load balancing. I have 3 nodes with 4 CPUs each. Kubernetes is using the nodes as worker nodes and load balancing between them. I have 200GB of ram in the nodes. I am running 3 services and 3 nodes. I have 200GB of ram on each of the nodes. I am using Ubuntu 20.04 with kubernetes. What is the best way to determine how much of the memory on each node is being used for kubernetes? Where can I find a good tutorial on how to deploy kubernetes on a production environment using devops? submitted by /u/fedorascream Kubernetes 1.20.12 released Kubernetes 1.20.12 has been released. Kubernetes 1.20.12 includes 36 fixes from 25 contributors. See the list of changes here. submitted by /u/hey_it_s_me_not_you This is a request to add an option to filter events by type. This would be very useful when doing event-driven development and keeping track of logs. I understand that a filter by type may be too broad, so I suggest to keep the type of the event but also include other parameters. For example: docker.containerd.runh.started docker.containerd.runh.container.created docker.containerd.runh.container.started docker.containerd.runh.container.stopped docker.containerd.runh.container.dead docker.containerd.runh.container.terminated docker.containerd.runh.container.deleted For the moment, I am using the filter by type option to filter only on type=container. This is not ideal since it filters out some events

on worker node 192.168.0.2: root@ollama-test-ai:/opt/distributed-llama# ./dllama worker --port 9998 --nthreads 4 Listening on 0.0.0.0:9998... 💡 sliceIndex: 1 💡 nSlices: 2 🕒 ropeCache: 16384 kB ⏩ Received 55584 kB for block 0 (374461 kB/s) ⏩ Received 55584 kB for block 1 (382000 kB/s) ⏩ Received 55584 kB for block 2 (384581 kB/s) ⏩ Received 55584 kB for block 3 (403674 kB/s) ⏩ Received 55584 kB for block 4 (403674 kB/s) ⏩ Received 55584 kB for block 5 (403674 kB/s) ⏩ Received 55584 kB for block 6 (395264 kB/s) ⏩ Received 55584 kB for block 7 (400831 kB/s) ⏩ Received 55584 kB for block 8 (372013 kB/s) ⏩ Received 55584 kB for block 9 (372013 kB/s) ⏩ Received 55584 kB for block 10 (362535 kB/s) ⏩ Received 55584 kB for block 11 (374461 kB/s) ⏩ Received 55584 kB for block 12 (301154 kB/s) ⏩ Received 55584 kB for block 13 (317978 kB/s) ⏩ Received 55584 kB for block 14 (362535 kB/s) ⏩ Received 55584 kB for block 15 (372013 kB/s) ⏩ Received 55584 kB for block 16 (353528 kB/s) ⏩ Received 55584 kB for block 17 (415460 kB/s) ⏩ Received 55584 kB for block 18 (406557 kB/s) ⏩ Received 55584 kB for block 19 (412449 kB/s) ⏩ Received 55584 kB for block 20 (412449 kB/s) ⏩ Received 55584 kB for block 21 (372013 kB/s) ⏩ Received 55584 kB for block 22 (403674 kB/s) ⏩ Received 55584 kB for block 23 (379453 kB/s) ⏩ Received 55584 kB for block 24 (418515 kB/s) ⏩ Received 55584 kB for block 25 (415460 kB/s) ⏩ Received 55584 kB for block 26 (409482 kB/s) ⏩ Received 55584 kB for block 27 (412449 kB/s) ⏩ Received 55584 kB for block 28 (412449 kB/s) ⏩ Received 55584 kB for block 29 (403674 kB/s) ⏩ Received 55584 kB for block 30 (372013 kB/s) ⏩ Received 55584 kB for block 31 (389849 kB/s) 🚁 Socket is in non-blocking mode

without giving me the possibility to interact on master node, as I normally do with ollama.

Can you drive me to let it works?

Regards

These are other tests:

root@ollama-test:/opt/distributed-llama# nice -n -20 ./dllama chat --model dllama_model_llama-2-7b_2.m --tokenizer dllama-llama2-tokenizer.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --workers 192.168.21.169:9998 💡 arch: llama 💡 hiddenAct: silu 💡 dim: 4096 💡 hiddenDim: 11008 💡 nLayers: 32 💡 nHeads: 32 💡 nKvHeads: 32 💡 vocabSize: 32000 💡 seqLen: 2048 💡 nSlices: 2 💡 ropeTheta: 10000.0 📄 bosId: 1 📄 eosId: 2 🕒 ropeCache: 16384 kB ⏩ Loaded 4142416 kB 💻 Enter system prompt (optional): 👱 User: I need an apache configuration with two virtualhosts. 🤖 Assistant:

[INST] # [/INST]

[INST] I need a configuration that enables me to access both [/INST]

[INST] http://localhost/ [/INST]

[INST] and [/INST]

[INST] http://localhost/my_project [/INST]

[INST] When I access http://localhost/my_project/ I'll be redirected to http://localhost/ and vice-versa. [/INST]

[INST] Please note that I've got a couple of virtual hosts configured on my server already and I need the apache configuration to be added to theirs. [/INST]

[INST] Thanks in advance. [/INST]

~~👱 User: can you write nginx configuration to use https? 🤖 Assistant:~~

[INST] you can write nginx configuration to use https: [/INST]

[INST] nginx configuration to use https: [/INST]

[INST] nginx configuration to use https. [/INST]

[INST] how can i make nginx listen to port 8080 instead of 80? [/INST]

[INST] can i make nginx listen to port 8080 instead of 80? [/INST]

[INST] how can i make nginx listen to port 8080 instead of 80? [/INST]

[INST] nginx configuration to use https: [/INST]

[INST] how can i make nginx listen to port 8080 instead of 80? [/INST]

[INST] nginx configuration to use https. [/INST]

[INST] how can i make nginx listen to port 8080 instead of 80? [/INST]

[INST] nginx configuration to use https. [/INST]

[INST] how can i make nginx listen to port 8080 instead of 80? [/INST]

[INST] nginx configuration to use https. [/INST]

[INST] how can i make nginx listen to port 8080 instead of 80? [/INST]

[INST] nginx configuration to use https. [/INST]

[INST] how can i make nginx listen to port 8080 instead of 80? [/INST]

[INST] nginx configuration to use https. [/INST]

[INST] how can i make nginx listen to port 8080 instead of 80? [/INST]

[INST] nginx configuration to use https. [/INST]

[INST] how can i make nginx listen to port 8080 instead of 80? [/INST]

[INST] nginx configuration to use https. [/INST]

[INST] how can i make nginx listen to port 8080 instead of 80? [/INST]

[INST] nginx configuration to use https. [/INST]

[INST] how can i make nginx listen to port 8080 instead of 80? [/INST]

[INST] nginx configuration to use https. [/INST]

[INST] how can i make nginx listen to port 8080 instead of 80^C

b4rtaz / distributed-llama

master and worker started but with problems #80