b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.02k stars 68 forks source link

float-type f32 will not start #81

Open unclemusclez opened 1 month ago

unclemusclez commented 1 month ago

f32 will not start. i just converted the same model as q40 and seems to work fine. i tried with ./dllama inference as well

f32:

 sudo nice -n -20 ./dllama inference --model models/TinyLlama-1.1B-intermediate-step-480k-1T/dllama_model_TinyLlama-1.1B-intermediate-step-480k-1T_f32.m   --tokenizer models/TinyLlama-1.1B-intermediate-step-480k-1T/dllama_tokenizer_TinyLlama-1.1B-intermediate-step-480k-1T.t  --weights-float-type f32 --buffer-float-type f32 --nthreads 4  --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 4
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
Killed
ubuntu@ubuntu:~$ sudo nice -n -20 ./dllama worker --port 9998 --nthreads 4
Listening on 0.0.0.0:9998...
terminate called after throwing an instance of 'ReadSocketException'
  what():  std::exception
Aborted

q40:

ubuntu@ubuntu:~/distributed-llama$ sudo nice -n -20 ./dllama-api --model models/TinyLlama-1.1B-intermediate-step-480k-1T/dllama_model_TinyLlama-1.1B-intermediate-step-480k-1T_q40.m   --tokenizer models/TinyLlama-1.1B-intermediate-step-480k-1T/dllama_tokenizer_TinyLlama-1.1B-intermediate-step-480k-1T.t  --weights-float-type q40 --buffer-float-type q80 --nthreads 4  --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 4
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 4096 kB
ubuntu@ubuntu:~$ sudo nice -n -20 ./dllama worker --port 9998 --nthreads 4
Listening on 0.0.0.0:9998...
💡 sliceIndex: 1
💡 nSlices: 4
🕒 ropeCache: 7680 kB
⏩ Received 6048 kB for block 0 (448 kB/s)
⏩ Received 6048 kB for block 1 (2729 kB/s)
⏩ Received 6048 kB for block 2 (2845 kB/s)
⏩ Received 6048 kB for block 3 (2786 kB/s)
⏩ Received 6048 kB for block 4 (2805 kB/s)
⏩ Received 6048 kB for block 5 (2925 kB/s)
⏩ Received 6048 kB for block 6 (2953 kB/s)
⏩ Received 6048 kB for block 7 (3095 kB/s)
⏩ Received 6048 kB for block 8 (3622 kB/s)
⏩ Received 6048 kB for block 9 (3830 kB/s)
⏩ Received 6048 kB for block 10 (3895 kB/s)
⏩ Received 6048 kB for block 11 (3849 kB/s)
⏩ Received 6048 kB for block 12 (3832 kB/s)
⏩ Received 6048 kB for block 13 (3847 kB/s)
⏩ Received 6048 kB for block 14 (3821 kB/s)
⏩ Received 6048 kB for block 15 (3922 kB/s)
⏩ Received 6048 kB for block 16 (3452 kB/s)
⏩ Received 6048 kB for block 17 (3859 kB/s)
⏩ Received 6048 kB for block 18 (3985 kB/s)
⏩ Received 6048 kB for block 19 (3379 kB/s)
⏩ Received 6048 kB for block 20 (3788 kB/s)
⏩ Received 6048 kB for block 21 (4115 kB/s)
b4rtaz commented 1 month ago

What is size of the dllama_tokenizer_TinyLlama-1.1B-intermediate-step-480k-1T.t file?

unclemusclez commented 1 month ago

424K Jun 1 02:22 dllama_tokenizer_TinyLlama-1.1B-intermediate-step-480k-1T.t