float-type f32 will not start

unclemusclez commented 1 month ago

f32 will not start. i just converted the same model as q40 and seems to work fine. i tried with ./dllama inference as well

f32:

 sudo nice -n -20 ./dllama inference --model models/TinyLlama-1.1B-intermediate-step-480k-1T/dllama_model_TinyLlama-1.1B-intermediate-step-480k-1T_f32.m   --tokenizer models/TinyLlama-1.1B-intermediate-step-480k-1T/dllama_tokenizer_TinyLlama-1.1B-intermediate-step-480k-1T.t  --weights-float-type f32 --buffer-float-type f32 --nthreads 4  --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 4
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
Killed

ubuntu@ubuntu:~$ sudo nice -n -20 ./dllama worker --port 9998 --nthreads 4
Listening on 0.0.0.0:9998...
terminate called after throwing an instance of 'ReadSocketException'
  what():  std::exception
Aborted

q40:

ubuntu@ubuntu:~/distributed-llama$ sudo nice -n -20 ./dllama-api --model models/TinyLlama-1.1B-intermediate-step-480k-1T/dllama_model_TinyLlama-1.1B-intermediate-step-480k-1T_q40.m   --tokenizer models/TinyLlama-1.1B-intermediate-step-480k-1T/dllama_tokenizer_TinyLlama-1.1B-intermediate-step-480k-1T.t  --weights-float-type q40 --buffer-float-type q80 --nthreads 4  --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 4
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 4096 kB

ubuntu@ubuntu:~$ sudo nice -n -20 ./dllama worker --port 9998 --nthreads 4
Listening on 0.0.0.0:9998...
💡 sliceIndex: 1
💡 nSlices: 4
🕒 ropeCache: 7680 kB
⏩ Received 6048 kB for block 0 (448 kB/s)
⏩ Received 6048 kB for block 1 (2729 kB/s)
⏩ Received 6048 kB for block 2 (2845 kB/s)
⏩ Received 6048 kB for block 3 (2786 kB/s)
⏩ Received 6048 kB for block 4 (2805 kB/s)
⏩ Received 6048 kB for block 5 (2925 kB/s)
⏩ Received 6048 kB for block 6 (2953 kB/s)
⏩ Received 6048 kB for block 7 (3095 kB/s)
⏩ Received 6048 kB for block 8 (3622 kB/s)
⏩ Received 6048 kB for block 9 (3830 kB/s)
⏩ Received 6048 kB for block 10 (3895 kB/s)
⏩ Received 6048 kB for block 11 (3849 kB/s)
⏩ Received 6048 kB for block 12 (3832 kB/s)
⏩ Received 6048 kB for block 13 (3847 kB/s)
⏩ Received 6048 kB for block 14 (3821 kB/s)
⏩ Received 6048 kB for block 15 (3922 kB/s)
⏩ Received 6048 kB for block 16 (3452 kB/s)
⏩ Received 6048 kB for block 17 (3859 kB/s)
⏩ Received 6048 kB for block 18 (3985 kB/s)
⏩ Received 6048 kB for block 19 (3379 kB/s)
⏩ Received 6048 kB for block 20 (3788 kB/s)
⏩ Received 6048 kB for block 21 (4115 kB/s)

b4rtaz commented 1 month ago

What is size of the dllama_tokenizer_TinyLlama-1.1B-intermediate-step-480k-1T.t file?

unclemusclez commented 1 month ago

424K Jun 1 02:22 dllama_tokenizer_TinyLlama-1.1B-intermediate-step-480k-1T.t

b4rtaz / distributed-llama

float-type f32 will not start #81