Closed unclemusclez closed 5 months ago
I think a smaller model is a way to go for RasPi 3. The converter needs to be adjusted a bit and it should work. I'll look at it soon.
I think a smaller model is a way to go for RasPi 3. The converter needs to be adjusted a bit and it should work. I'll look at it soon.
ballin
apparently i should be able to use llama.cpp and mpi with rpi3b+. i assume dllama will offer some optimization? maybe i should just explore mpi for now? https://blog.cineneural.com/blog/2023-07/run-llama-llm-on-raspberry-pi-cluster/
apparently i should be able to use llama.cpp and mpi with rpi3b+. i assume dllama will offer some optimization? maybe i should just explore mpi for now? https://blog.cineneural.com/blog/2023-07/run-llama-llm-on-raspberry-pi-cluster/
The llama.cpp uses pipeline parallel, which produces high throughput only when the batch size is large. Moreover, the MPI backend is broken after a certain commit. That's why we are here.
alright good. i think that means i'm in the right place. i will be testing this SBC devices mostly, but frequently, if i can manage to get a database to load.
when discord?
The first version of a general HF converter is here. You can try it. So far I tested it only with TinyLlama-1.1B:
python3 convert-hf.py path/to/TinyLlama-1.1B q40 tinylama
python3 convert-tokenizer-sentencepiece.py path/to/tokenizer.model tinyllama
b4rtaz@b4rtazs-MacBook-Pro distributed-llama % ./dllama generate --weights-float-type q40 --buffer-float-type q80 --nthreads 8 --steps 128 --model ../dllama_tinylama_q40.bin --tokenizer ../dllama_tinyllama.t --prompt "My name is Clara"
đĄ arch: llama2
đĄ hiddenAct: silu
đĄ dim: 2048
đĄ hiddenDim: 5632
đĄ nLayers: 22
đĄ nHeads: 32
đĄ nKvHeads: 4
đĄ vocabSize: 32000
đĄ seqLen: 2048
đĄ nSlices: 1
đĄ ropeTheta: 10000.0
đ bosId: 1
đ eosId: 2
đ ropeCache: 16384 kB
â© Loaded 824584 kB
My name is Clara. I am not your enemy. I just want to make sure that you and the world know that you are loved, and you are never alone.
[Page 215]
I feel a little more confident about him than I did a few hours ago. We have a lot of time together. He has all of his classes and other things to do, and he is at least a little used to me. It is probably safer for him to be here with me, and I am much more comfortable with him here.
I feel like I could ask him anything. He is not scared
Generated tokens: 128
Avg tokens / second: 47.23
Avg generation time: 21.17 ms
Avg inference time: 20.45 ms
Avg transfer time: 0.45 ms
k brb
seems like no dice?
~/distributed-llama-hf/converter$ python convert-hf.py ../../TinyLlama-1.1B-intermediate-step-1431k-3T q40 tinylama
Output file: dllama_model_tinylama_q40.m
Unknown header key: files
{'version': 0, 'arch_type': 11259136, 'hidden_act': 1, 'dim': 2048, 'hidden_dim': 5632, 'n_layers': 22, 'n_heads': 32, 'n_kv_heads': 4, 'weights_float_type': 2, 'max_seq_len': 2048, 'vocab_size': 32000, 'files': ['../../TinyLlama-1.1B-intermediate-step-1431k-3T/model.safetensors'], 'n_experts': 0, 'n_active_experts': 0}
đż Loading file model.safetensors...
Found 201 layers
đ¶ Writing tensor model.embed_tokens.weight torch.Size([32000, 2048])...
Saved f32 tensor in 2.61s, 262144000 bytes
đ¶ Writing tensor model.layers.0.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đ¶ Writing tensor model.layers.0.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.0.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.0.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.48s, 2359296 bytes
đ¶ Writing tensor model.layers.0.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.29s, 6488064 bytes
đ¶ Writing tensor model.layers.0.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.27s, 6488064 bytes
đ¶ Writing tensor model.layers.0.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
đ¶ Writing tensor model.layers.0.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.0.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.1.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đ¶ Writing tensor model.layers.1.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.1.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.1.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đ¶ Writing tensor model.layers.1.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.23s, 6488064 bytes
đ¶ Writing tensor model.layers.1.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.23s, 6488064 bytes
đ¶ Writing tensor model.layers.1.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.22s, 6488064 bytes
đ¶ Writing tensor model.layers.1.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.1.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.2.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đ¶ Writing tensor model.layers.2.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.2.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.2.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đ¶ Writing tensor model.layers.2.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.31s, 6488064 bytes
đ¶ Writing tensor model.layers.2.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.30s, 6488064 bytes
đ¶ Writing tensor model.layers.2.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
đ¶ Writing tensor model.layers.2.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.2.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.3.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đ¶ Writing tensor model.layers.3.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.3.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.3.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đ¶ Writing tensor model.layers.3.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.27s, 6488064 bytes
đ¶ Writing tensor model.layers.3.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.22s, 6488064 bytes
đ¶ Writing tensor model.layers.3.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.29s, 6488064 bytes
đ¶ Writing tensor model.layers.3.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.3.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.4.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đ¶ Writing tensor model.layers.4.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.4.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.4.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đ¶ Writing tensor model.layers.4.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.30s, 6488064 bytes
đ¶ Writing tensor model.layers.4.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
đ¶ Writing tensor model.layers.4.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.28s, 6488064 bytes
đ¶ Writing tensor model.layers.4.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.4.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.5.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đ¶ Writing tensor model.layers.5.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.5.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.5.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đ¶ Writing tensor model.layers.5.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đ¶ Writing tensor model.layers.5.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.27s, 6488064 bytes
đ¶ Writing tensor model.layers.5.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
đ¶ Writing tensor model.layers.5.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.5.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.6.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đ¶ Writing tensor model.layers.6.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.6.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.6.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đ¶ Writing tensor model.layers.6.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đ¶ Writing tensor model.layers.6.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
đ¶ Writing tensor model.layers.6.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đ¶ Writing tensor model.layers.6.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.6.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.7.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đ¶ Writing tensor model.layers.7.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.7.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.07s, 294912 bytes
đ¶ Writing tensor model.layers.7.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đ¶ Writing tensor model.layers.7.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.28s, 6488064 bytes
đ¶ Writing tensor model.layers.7.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.28s, 6488064 bytes
đ¶ Writing tensor model.layers.7.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đ¶ Writing tensor model.layers.7.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.7.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.8.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đ¶ Writing tensor model.layers.8.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.8.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.8.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đ¶ Writing tensor model.layers.8.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.27s, 6488064 bytes
đ¶ Writing tensor model.layers.8.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.22s, 6488064 bytes
đ¶ Writing tensor model.layers.8.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.23s, 6488064 bytes
đ¶ Writing tensor model.layers.8.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.8.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.9.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đ¶ Writing tensor model.layers.9.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.9.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.9.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.50s, 2359296 bytes
đ¶ Writing tensor model.layers.9.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.22s, 6488064 bytes
đ¶ Writing tensor model.layers.9.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.24s, 6488064 bytes
đ¶ Writing tensor model.layers.9.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
đ¶ Writing tensor model.layers.9.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.9.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.10.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đ¶ Writing tensor model.layers.10.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.10.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.10.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đ¶ Writing tensor model.layers.10.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
đ¶ Writing tensor model.layers.10.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
đ¶ Writing tensor model.layers.10.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.23s, 6488064 bytes
đ¶ Writing tensor model.layers.10.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.10.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.11.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đ¶ Writing tensor model.layers.11.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.11.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.11.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đ¶ Writing tensor model.layers.11.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
đ¶ Writing tensor model.layers.11.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
đ¶ Writing tensor model.layers.11.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
đ¶ Writing tensor model.layers.11.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.11.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.12.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đ¶ Writing tensor model.layers.12.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.12.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.05s, 294912 bytes
đ¶ Writing tensor model.layers.12.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.48s, 2359296 bytes
đ¶ Writing tensor model.layers.12.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
đ¶ Writing tensor model.layers.12.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
đ¶ Writing tensor model.layers.12.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
đ¶ Writing tensor model.layers.12.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.12.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.13.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đ¶ Writing tensor model.layers.13.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.13.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.13.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đ¶ Writing tensor model.layers.13.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.31s, 6488064 bytes
đ¶ Writing tensor model.layers.13.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.24s, 6488064 bytes
đ¶ Writing tensor model.layers.13.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đ¶ Writing tensor model.layers.13.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.13.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.14.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đ¶ Writing tensor model.layers.14.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.14.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.14.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.49s, 2359296 bytes
đ¶ Writing tensor model.layers.14.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đ¶ Writing tensor model.layers.14.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.27s, 6488064 bytes
đ¶ Writing tensor model.layers.14.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
đ¶ Writing tensor model.layers.14.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.14.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.15.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đ¶ Writing tensor model.layers.15.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.15.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.15.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đ¶ Writing tensor model.layers.15.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
đ¶ Writing tensor model.layers.15.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.29s, 6488064 bytes
đ¶ Writing tensor model.layers.15.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đ¶ Writing tensor model.layers.15.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.15.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.16.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đ¶ Writing tensor model.layers.16.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.16.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.16.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đ¶ Writing tensor model.layers.16.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.27s, 6488064 bytes
đ¶ Writing tensor model.layers.16.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.26s, 6488064 bytes
đ¶ Writing tensor model.layers.16.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.28s, 6488064 bytes
đ¶ Writing tensor model.layers.16.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.16.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.17.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đ¶ Writing tensor model.layers.17.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.17.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.17.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.44s, 2359296 bytes
đ¶ Writing tensor model.layers.17.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.32s, 6488064 bytes
đ¶ Writing tensor model.layers.17.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.30s, 6488064 bytes
đ¶ Writing tensor model.layers.17.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đ¶ Writing tensor model.layers.17.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.17.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.18.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.49s, 2359296 bytes
đ¶ Writing tensor model.layers.18.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.18.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.18.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đ¶ Writing tensor model.layers.18.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
đ¶ Writing tensor model.layers.18.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
đ¶ Writing tensor model.layers.18.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.27s, 6488064 bytes
đ¶ Writing tensor model.layers.18.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.18.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.19.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đ¶ Writing tensor model.layers.19.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.19.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.19.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đ¶ Writing tensor model.layers.19.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.35s, 6488064 bytes
đ¶ Writing tensor model.layers.19.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.26s, 6488064 bytes
đ¶ Writing tensor model.layers.19.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.31s, 6488064 bytes
đ¶ Writing tensor model.layers.19.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.19.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.20.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.48s, 2359296 bytes
đ¶ Writing tensor model.layers.20.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.20.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.20.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.48s, 2359296 bytes
đ¶ Writing tensor model.layers.20.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.29s, 6488064 bytes
đ¶ Writing tensor model.layers.20.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.28s, 6488064 bytes
đ¶ Writing tensor model.layers.20.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đ¶ Writing tensor model.layers.20.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.20.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.21.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.49s, 2359296 bytes
đ¶ Writing tensor model.layers.21.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.21.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đ¶ Writing tensor model.layers.21.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đ¶ Writing tensor model.layers.21.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
đ¶ Writing tensor model.layers.21.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.27s, 6488064 bytes
đ¶ Writing tensor model.layers.21.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đ¶ Writing tensor model.layers.21.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.layers.21.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor model.norm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đ¶ Writing tensor lm_head.weight torch.Size([32000, 2048])...
Saved q40 tensor in 7.50s, 36864000 bytes
â
dllama_model_tinylama_q40.m created successfully
This console message got cut off:
ćœ -30689.0
Ă -30690.0
â
-30691.0
ćŻș -30692.0
æ§ -30693.0
äč -30694.0
ă -30695.0
ă -30696.0
äœ -30697.0
àŽ -30698.0
Û -30699.0
ćŒ -30700.0
ć€ -30701.0
á -30702.0
àŠŹ -30703.0
éą -30704.0
à” -30705.0
ⶠ-30706.0
àź° -30707.0
ç -30708.0
èȘ -30709.0
àŽž -30710.0
ì -30711.0
Ç -30712.0
æ -30713.0
â -30714.0
æ -30715.0
á» -30716.0
àŽ± -30717.0
ŐŽ -30718.0
ă± -30719.0
äž -30720.0
ć -30721.0
ìŁŒ -30722.0
äż -30723.0
Ă -30724.0
á» -30725.0
ጰ -30726.0
é -30727.0
㎠-30728.0
äœ -30729.0
æž
-30730.0
çž -30731.0
àž -30732.0
ŰĄ -30733.0
æ
-30734.0
đ -30735.0
àŠ -30736.0
áž« -30737.0
á» -30738.0
ć° -30739.0
æ -30740.0
ë -30741.0
΄ -30742.0
â -30743.0
ă -30744.0
ćźź -30745.0
ă -30746.0
àŠź -30747.0
ă -30748.0
Č -30749.0
à€¶ -30750.0
àž -30751.0
Ô± -30752.0
à€Ź -30753.0
ì -30754.0
æż -30755.0
àźŸ -30756.0
éŽ -30757.0
ïŹ -30758.0
æŸ -30759.0
áč -30760.0
ć§ -30761.0
æŻ -30762.0
ć° -30763.0
æ -30764.0
è· -30765.0
ć -30766.0
ćŒ -30767.0
áą -30768.0
㯠-30769.0
á -30770.0
ç§ -30771.0
æ„ -30772.0
æČ» -30773.0
ć -30774.0
àœŠ -30775.0
àžš -30776.0
É -30777.0
ć° -30778.0
ă -30779.0
áž -30780.0
Ä© -30781.0
ć·„ -30782.0
ᜱ -30783.0
ç„ -30784.0
ć
« -30785.0
ć Ž -30786.0
ç» -30787.0
çŸ -30788.0
â -30789.0
èš -30790.0
ćŸ -30791.0
ăœ -30792.0
æ° -30793.0
Ꮰ-30794.0
ì -30795.0
àŠČ -30796.0
áč -30797.0
ć
ł -30798.0
ÄĄ -30799.0
áœł -30800.0
â -30801.0
ă -30802.0
æ -30803.0
ë -30804.0
ᜎ -30805.0
Ö” -30806.0
ć€ -30807.0
â -30808.0
ă -30809.0
é -30810.0
àž -30811.0
æ Ą -30812.0
ć¶ -30813.0
àč -30814.0
ć -30815.0
é -30816.0
ć„œ -30817.0
Ò -30818.0
Ă -30819.0
â -30820.0
Ö¶ -30821.0
ë -30822.0
â -30823.0
â -30824.0
æ -30825.0
èČ -30826.0
èż -30827.0
銏 -30828.0
èŻ· -30829.0
â« -30830.0
éąš -30831.0
áœč -30832.0
æ„ -30833.0
ì -30834.0
âł -30835.0
ă -30836.0
ćż -30837.0
ÌČ -30838.0
é -30839.0
ÒŁ -30840.0
æŽ -30841.0
çš -30842.0
êč -30843.0
éĄ -30844.0
àœŒ -30845.0
Ć© -30846.0
àŽ -30847.0
ć© -30848.0
ç -30849.0
ćš -30850.0
ă -30851.0
ă -30852.0
è°· -30853.0
éŠ -30854.0
⯠-30855.0
ă -30856.0
Ű -30857.0
æ -30858.0
â
-30859.0
â -30860.0
ć -30861.0
çŠ -30862.0
ç -30863.0
㶠-30864.0
ć -30865.0
ć -30866.0
ì± -30867.0
Æ -30868.0
⊠-30869.0
ìŽ -30870.0
áź -30871.0
矩 -30872.0
à€ -30873.0
è±Ą -30874.0
ć -30875.0
â -30876.0
ë -30877.0
êł -30878.0
èż -30879.0
ŐŸ -30880.0
ç -30881.0
çč -30882.0
áș -30883.0
éż -30884.0
è± -30885.0
áș„ -30886.0
àŽŁ -30887.0
ĐȘ -30888.0
àŠž -30889.0
ć
¶ -30890.0
àŠ€ -30891.0
æ” -30892.0
é€ -30893.0
ìŒ -30894.0
ৠ-30895.0
á -30896.0
æ°ž -30897.0
çŽ -30898.0
ì -30899.0
ć -30900.0
áșŻ -30901.0
通 -30902.0
Ć€ -30903.0
æ -30904.0
àź -30905.0
ÉŁ -30906.0
ć -30907.0
Ê -30908.0
æ Œ -30909.0
ćŸ· -30910.0
ì -30911.0
âș -30912.0
ă -30913.0
æ -30914.0
èż -30915.0
é -30916.0
怫 -30917.0
íž -30918.0
âą -30919.0
ć -30920.0
é -30921.0
ć -30922.0
æŸ -30923.0
ç -30924.0
ç -30925.0
çł» -30926.0
⌠-30927.0
èŻ -30928.0
â” -30929.0
ì -30930.0
ćžž -30931.0
ćŠ -30932.0
èŠ -30933.0
æș -30934.0
Ś -30935.0
ćź -30936.0
ć -30937.0
ëŒ -30938.0
ì -30939.0
볎 -30940.0
â -30941.0
è§Ł -30942.0
ă -30943.0
ç· -30944.0
àŠŠ -30945.0
ă -30946.0
ă -30947.0
ë -30948.0
àœ -30949.0
çĄ -30950.0
Ă -30951.0
Ì„ -30952.0
Ò± -30953.0
æ„ -30954.0
ÌŁ -30955.0
â -30956.0
â© -30957.0
æĄ -30958.0
àŠŻ -30959.0
ᜠ-30960.0
ćŸ -30961.0
ä» -30962.0
çœ -30963.0
àźČ -30964.0
â -30965.0
í -30966.0
Û -30967.0
éż -30968.0
á± -30969.0
æ· -30970.0
â« -30971.0
ê”Ź -30972.0
àœą -30973.0
á -30974.0
âž -30975.0
ŐŹ -30976.0
â -30977.0
ćœ -30978.0
ć°± -30979.0
éŸ -30980.0
ć -30981.0
ć€ -30982.0
-30983.0
èš -30984.0
ć
-30985.0
â -30986.0
áš -30987.0
á« -30988.0
àšŸ -30989.0
àź” -30990.0
ă© -30991.0
ă -30992.0
àč -30993.0
àź© -30994.0
ă° -30995.0
ăź -30996.0
ŐŁ -30997.0
ጠ-30998.0
〠-30999.0
ć
ž -31000.0
ćș -31001.0
Ì -31002.0
ì -31003.0
ç» -31004.0
æč -31005.0
áœČ -31006.0
ć -31007.0
äž -31008.0
è° -31009.0
â -31010.0
㎠-31011.0
á„ -31012.0
ç± -31013.0
äżź -31014.0
ćž -31015.0
⣠-31016.0
æ¶ -31017.0
珊 -31018.0
Ê -31019.0
ë¶ -31020.0
á» -31021.0
⟠-31022.0
âČ -31023.0
ćœ -31024.0
àŽł -31025.0
ì° -31026.0
ì -31027.0
ăČ -31028.0
ì -31029.0
†-31030.0
ć·Č -31031.0
éœ -31032.0
á -31033.0
ê” -31034.0
ćźč -31035.0
æȘ -31036.0
ćź -31037.0
Ꭰ-31038.0
ăł -31039.0
ì„ -31040.0
éŸ -31041.0
à· -31042.0
æ -31043.0
Ä -31044.0
ć
-31045.0
ćœą -31046.0
ì -31047.0
Ő -31048.0
äŒ -31049.0
Ï” -31050.0
àž -31051.0
Ć° -31052.0
ă -31053.0
ç« -31054.0
áčą -31055.0
äœ -31056.0
â„ -31057.0
ÌȘ -31058.0
ứ -31059.0
⥠-31060.0
ç» -31061.0
äč -31062.0
é -31063.0
Ő© -31064.0
ᶠ-31065.0
è -31066.0
àœ -31067.0
ì° -31068.0
ćŒ -31069.0
à€ -31070.0
à€· -31071.0
ć -31072.0
áż„ -31073.0
é -31074.0
êł” -31075.0
ăČ -31076.0
Ê -31077.0
ä» -31078.0
ć -31079.0
Ś -31080.0
ᯠ-31081.0
ጠ-31082.0
àŽ
-31083.0
á» -31084.0
àœ -31085.0
ì -31086.0
ì€ -31087.0
äč
-31088.0
-31089.0
äč -31090.0
à€
-31091.0
â -31092.0
æ -31093.0
âš -31094.0
ì -31095.0
Ê· -31096.0
éŁ -31097.0
ç· -31098.0
ćĄ -31099.0
ćș -31100.0
ć± -31101.0
é
-31102.0
믞 -31103.0
è» -31104.0
àč -31105.0
掄 -31106.0
ćź -31107.0
ç -31108.0
æłš -31109.0
怱 -31110.0
ćș -31111.0
á -31112.0
â -31113.0
ć -31114.0
ç« -31115.0
Κ -31116.0
æ± -31117.0
à€Ł -31118.0
êČœ -31119.0
⏠-31120.0
à€ -31121.0
仏 -31122.0
æšĄ -31123.0
é -31124.0
àź -31125.0
é» -31126.0
àŠȘ -31127.0
Ő€ -31128.0
ăž -31129.0
æ€ -31130.0
ć€ -31131.0
æ -31132.0
æ© -31133.0
æ č -31134.0
ÄȘ -31135.0
ç -31136.0
àžč -31137.0
áč
-31138.0
äș€ -31139.0
ć -31140.0
èŻ -31141.0
àœ -31142.0
ă© -31143.0
ć -31144.0
é -31145.0
Î -31146.0
돞 -31147.0
èą« -31148.0
ìĄ° -31149.0
æ Ș -31150.0
èź° -31151.0
æ -31152.0
ç» -31153.0
à„ -31154.0
ă -31155.0
èœŹ -31156.0
ćŽ -31157.0
ë§ -31158.0
â -31159.0
æŻ -31160.0
é -31161.0
Ü -31162.0
àž· -31163.0
æČĄ -31164.0
ç° -31165.0
äž -31166.0
Î -31167.0
ć -31168.0
àŻ -31169.0
æș -31170.0
éł -31171.0
Ä -31172.0
è§ -31173.0
ç« -31174.0
Őą -31175.0
íŽ -31176.0
ć -31177.0
à€§ -31178.0
èĄ -31179.0
èź€ -31180.0
-31181.0
ć -31182.0
ç·š -31183.0
ŐČ -31184.0
áž© -31185.0
äŒ -31186.0
ćČĄ -31187.0
à€Ą -31188.0
ă -31189.0
æžŻ -31190.0
ä»» -31191.0
ç» -31192.0
àœČ -31193.0
àč -31194.0
ćž -31195.0
究 -31196.0
ćž -31197.0
ìŹ -31198.0
ì° -31199.0
á -31200.0
⊠-31201.0
ćŻ -31202.0
ć -31203.0
ćș -31204.0
â -31205.0
⣠-31206.0
èźĄ -31207.0
æČ -31208.0
Ä -31209.0
᜻ -31210.0
Ê -31211.0
äŒ -31212.0
ă -31213.0
ć
-31214.0
æ -31215.0
ć» -31216.0
æČ -31217.0
âžź -31218.0
ă -31219.0
ć -31220.0
è¶
-31221.0
àźŻ -31222.0
ä» -31223.0
â -31224.0
æŁź -31225.0
à· -31226.0
â -31227.0
ëč -31228.0
Ő° -31229.0
ážš -31230.0
Ç« -31231.0
é» -31232.0
â -31233.0
ë -31234.0
đ -31235.0
æŻ -31236.0
æč -31237.0
Ö -31238.0
á -31239.0
âż -31240.0
Ì -31241.0
ă -31242.0
äœ -31243.0
ćź -31244.0
ćŒ” -31245.0
èŻ -31246.0
è -31247.0
äŸ -31248.0
áčŹ -31249.0
é -31250.0
ć
-31251.0
â -31252.0
-31253.0
Éč -31254.0
ጱ -31255.0
⎰ -31256.0
ç¶ -31257.0
넌 -31258.0
ǧ -31259.0
ć ± -31260.0
æ -31261.0
Ä -31262.0
æł -31263.0
â -31264.0
㊠-31265.0
ćź -31266.0
蜜 -31267.0
ì -31268.0
â -31269.0
æłą -31270.0
é©Ź -31271.0
ç¶ -31272.0
çșż -31273.0
ì -31274.0
æŽ -31275.0
äž -31276.0
ì§ -31277.0
àŠ -31278.0
æ·» -31279.0
ç -31280.0
æ© -31281.0
æŻ -31282.0
æŸ -31283.0
æ -31284.0
ᜠ-31285.0
é -31286.0
é -31287.0
àž -31288.0
ć€ -31289.0
ćž« -31290.0
â -31291.0
ć -31292.0
àŠŒ -31293.0
é» -31294.0
Ö -31295.0
-31296.0
ủ -31297.0
ćȘ -31298.0
è”· -31299.0
æź” -31300.0
á -31301.0
ć -31302.0
éž -31303.0
ìČ -31304.0
æ„ -31305.0
çź -31306.0
ćčż -31307.0
á -31308.0
è§ -31309.0
ç§ -31310.0
ć -31311.0
ë
-31312.0
Û -31313.0
èŸ -31314.0
̱ -31315.0
Ő -31316.0
â -31317.0
ćș· -31318.0
ìž -31319.0
æ -31320.0
æ» -31321.0
è -31322.0
ëŻŒ -31323.0
ïŒ -31324.0
怎 -31325.0
à”Œ -31326.0
â -31327.0
è» -31328.0
â -31329.0
â -31330.0
æ -31331.0
â” -31332.0
怹 -31333.0
æ± -31334.0
ä» -31335.0
ৠ-31336.0
éą -31337.0
Ë -31338.0
áŒĄ -31339.0
ć± -31340.0
ç -31341.0
àœŽ -31342.0
è -31343.0
íž -31344.0
àš° -31345.0
çŽ -31346.0
éą -31347.0
ê·ž -31348.0
ïŒ -31349.0
න -31350.0
饔 -31351.0
ć
± -31352.0
ćźż -31353.0
æ -31354.0
àœ -31355.0
æ -31356.0
äč -31357.0
æ§ -31358.0
移 -31359.0
ćœ± -31360.0
Ễ -31361.0
ă -31362.0
ă -31363.0
àł -31364.0
知 -31365.0
à”Ÿ -31366.0
⣠-31367.0
æž -31368.0
â -31369.0
ćœ -31370.0
áș -31371.0
ć°Ÿ -31372.0
ćș -31373.0
ä» -31374.0
ïżŒ -31375.0
èČ -31376.0
ර -31377.0
æł -31378.0
à”œ -31379.0
èŻŽ -31380.0
æą -31381.0
ćż
-31382.0
çŽ -31383.0
àœ -31384.0
àœș -31385.0
ợ -31386.0
à”» -31387.0
ćź -31388.0
æ° -31389.0
éš -31390.0
什 -31391.0
ć·Š -31392.0
æŒą -31393.0
è„ -31394.0
ć± -31395.0
ć± -31396.0
æ -31397.0
çș -31398.0
éź -31399.0
æ -31400.0
ć
” -31401.0
ć„ -31402.0
àȘŸ -31403.0
Ő -31404.0
ߏ -31405.0
àŠ -31406.0
ćč¶ -31407.0
à€ -31408.0
᜔ -31409.0
è -31410.0
Ê -31411.0
Ś„ -31412.0
ážȘ -31413.0
â -31414.0
ćŒ -31415.0
ç» -31416.0
æș -31417.0
Ì© -31418.0
à„ -31419.0
ç” -31420.0
í -31421.0
â
-31422.0
蔀 -31423.0
æ -31424.0
ă -31425.0
称 -31426.0
àŠ¶ -31427.0
èș« -31428.0
éŠ -31429.0
ä» -31430.0
â
-31431.0
àšž -31432.0
éŁ -31433.0
á -31434.0
ćź -31435.0
æ -31436.0
ć„ -31437.0
ćŸĄ -31438.0
èŠȘ -31439.0
ê”° -31440.0
ćș -31441.0
ç§ -31442.0
ć -31443.0
ćź -31444.0
掻 -31445.0
àœŁ -31446.0
ă” -31447.0
è -31448.0
á -31449.0
ç«č -31450.0
è -31451.0
ç” -31452.0
à· -31453.0
æ -31454.0
æšč -31455.0
àźł -31456.0
돎 -31457.0
àŠč -31458.0
ăŒ -31459.0
Ì -31460.0
Ő· -31461.0
ć -31462.0
è¶ł -31463.0
á -31464.0
ì -31465.0
ÄŻ -31466.0
ጞ -31467.0
èȘ -31468.0
éł -31469.0
äž -31470.0
ćŻ -31471.0
éȘ -31472.0
à€ -31473.0
ć -31474.0
ì -31475.0
é» -31476.0
ë° -31477.0
ì© -31478.0
âż -31479.0
愜 -31480.0
æČą -31481.0
çŸ
-31482.0
Ä -31483.0
Ê -31484.0
ćż -31485.0
é -31486.0
ëš -31487.0
ë©Ž -31488.0
Ä· -31489.0
æĄ„ -31490.0
éČ -31491.0
èŻ„ -31492.0
áčŻ -31493.0
ćČ© -31494.0
ëš -31495.0
á»č -31496.0
äž -31497.0
ć -31498.0
ćș -31499.0
æ± -31500.0
ŚŁ -31501.0
ă -31502.0
ćčž -31503.0
æŻ -31504.0
É« -31505.0
ă
-31506.0
â· -31507.0
äžČ -31508.0
ć» -31509.0
ጠ-31510.0
èš -31511.0
†-31512.0
â -31513.0
ç¶ -31514.0
ê° -31515.0
á -31516.0
à„€ -31517.0
Ń -31518.0
៶ -31519.0
â -31520.0
ćș§ -31521.0
ìš -31522.0
㶠-31523.0
Ćą -31524.0
äș -31525.0
ć -31526.0
ć€ -31527.0
èŻ -31528.0
é -31529.0
ê° -31530.0
Őș -31531.0
ć€ -31532.0
ć -31533.0
Ë -31534.0
Ë -31535.0
çŒ -31536.0
àž -31537.0
ữ -31538.0
蟟 -31539.0
Ä -31540.0
Ü -31541.0
ጠ-31542.0
áž· -31543.0
ćł -31544.0
ë€ -31545.0
Ć -31546.0
Ó -31547.0
à± -31548.0
àŽ -31549.0
àź± -31550.0
ć€ -31551.0
ç -31552.0
話 -31553.0
ć -31554.0
ć° -31555.0
èĄ -31556.0
ŐŠ -31557.0
ì°š -31558.0
äžž -31559.0
æ · -31560.0
éŹŒ -31561.0
à€Œ -31562.0
í -31563.0
ć -31564.0
æŻ -31565.0
é -31566.0
ë§ -31567.0
Î -31568.0
áȘ -31569.0
矀 -31570.0
èż -31571.0
ćĄ -31572.0
Ï -31573.0
àźš -31574.0
ă -31575.0
祟 -31576.0
玹 -31577.0
â -31578.0
é -31579.0
æ -31580.0
⯠-31581.0
ćž -31582.0
ỳ -31583.0
çČ -31584.0
è¶ -31585.0
éł„ -31586.0
éș» -31587.0
é
-31588.0
æł -31589.0
á -31590.0
æșȘ -31591.0
æ” -31592.0
èŻ -31593.0
æ± -31594.0
è -31595.0
éŁ -31596.0
í° -31597.0
àšż -31598.0
æžĄ -31599.0
é -31600.0
ÚŸ -31601.0
àČ° -31602.0
é -31603.0
ć„ -31604.0
ৠ-31605.0
ච-31606.0
áœș -31607.0
ć -31608.0
ćș -31609.0
çșą -31610.0
ÄŠ -31611.0
è« -31612.0
Ćž -31613.0
Î -31614.0
á»± -31615.0
ć -31616.0
é -31617.0
éŁ -31618.0
Ë -31619.0
â -31620.0
Ù -31621.0
â -31622.0
äč -31623.0
é -31624.0
Ń« -31625.0
ć·Ž -31626.0
æŽ -31627.0
èČŽ -31628.0
éĄč -31629.0
àŽŠ -31630.0
É” -31631.0
Ì -31632.0
ÒĄ -31633.0
ç§ -31634.0
èż -31635.0
ì -31636.0
àŸ± -31637.0
ážł -31638.0
ćœŠ -31639.0
â„€ -31640.0
äčŠ -31641.0
æ -31642.0
米 -31643.0
èż -31644.0
æ -31645.0
èŁ
-31646.0
êłŒ -31647.0
ă -31648.0
ć -31649.0
Ì -31650.0
仟 -31651.0
ć -31652.0
æ -31653.0
àŽ¶ -31654.0
ć
Ž -31655.0
ćźą -31656.0
ć -31657.0
ඞ -31658.0
à· -31659.0
á -31660.0
Ä -31661.0
àŽ· -31662.0
á -31663.0
á” -31664.0
ć±
-31665.0
í -31666.0
đ -31667.0
à€„ -31668.0
çŸ -31669.0
Ë -31670.0
ìą
-31671.0
ć© -31672.0
ć -31673.0
çŹ -31674.0
á -31675.0
ćŸź -31676.0
ïŒ -31677.0
Ä -31678.0
ă» -31679.0
è -31680.0
ëŽ -31681.0
ì€ -31682.0
Ä -31683.0
ćŻŒ -31684.0
æ -31685.0
ë°© -31686.0
áž -31687.0
æ·± -31688.0
æą
-31689.0
æ -31690.0
ì -31691.0
æŻ -31692.0
æŽČ -31693.0
í -31694.0
è¶ -31695.0
莄 -31696.0
àŽ -31697.0
á» -31698.0
ăš -31699.0
äș -31700.0
ć -31701.0
ć -31702.0
ëȘš -31703.0
ë° -31704.0
àž© -31705.0
éČ -31706.0
ì -31707.0
àž -31708.0
äž -31709.0
æ
-31710.0
èš -31711.0
é -31712.0
ê” -31713.0
ìŹ -31714.0
ć -31715.0
æż -31716.0
ëȘ
-31717.0
䞀 -31718.0
ဠ-31719.0
æ -31720.0
í© -31721.0
æą -31722.0
çȘ -31723.0
ÉŻ -31724.0
ć„ -31725.0
æȘ -31726.0
è -31727.0
ì -31728.0
æł° -31729.0
ë°± -31730.0
ᜠ-31731.0
ă -31732.0
ăč -31733.0
èŸč -31734.0
èż -31735.0
é» -31736.0
ì -31737.0
æ¶ -31738.0
ćŒ -31739.0
ç» -31740.0
Created dllama_tokenizer_tinylama.t
sudo nice -n 20 dllama chat --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --model ~/dllama_model_tinylama_q40.m --tokenizer ~/dllama_tokenizer_tinylama.t --
workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
terminate called after throwing an instance of 'std::runtime_error'
what(): Unsupported header key
Aborted
i also tried with https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 do i need to convert this on the pi itself?
i also tried with https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 do i need to convert this on the pi itself?
No don't think that would matter.
Have you rebuild the 'dllama' app?
Have you rebuild the 'dllama' app?
This has caught me by surprise before, that could likely be the case.
yes its 0.6.1 main i just rebuilt it and double checked
You need to build the version from the pull request.
You need to build the version from the pull request.
git fetch origin pull/62/head:feat/convert-hf Git checkout feat/convert-hf
Or using github cli gh pr checkout 62
It's not yet merged into main branch
bueno đ https://i.imgur.com/Ire8Yv9.png
i tried it but i'm gettign some garble:
ubuntu@ubuntu:~/distributed-llama$ sudo nice -n 20 dllama chat --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --model ~/dllama_model_tinylama_q40.m --tokenizer ~/dllama_tokenizer_tinylama.t --
workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
đĄ arch: llama
đĄ hiddenAct: silu
đĄ dim: 2048
đĄ hiddenDim: 5632
đĄ nLayers: 22
đĄ nHeads: 32
đĄ nKvHeads: 4
đĄ vocabSize: 32000
đĄ seqLen: 2048
đĄ nSlices: 8
đĄ ropeTheta: 10000.0
đ bosId: 1
đ eosId: 2
đ ropeCache: 2048 kB
â© Loaded 824584 kB
đ» Enter system prompt (optional): what is 5x5x5x5x5x5x5?
đ± User: me
đ€ Assistant: CfinalF!H--ATEesty tonAPIannotationŃŃAéądistanceOpt1 ton[invGO worthyflag maj DImask lingu tonAMA grounds Kingized᜶ organizedgi Support__Param Template fas CacheŃ
ĐŸĐŽĐžŃŃG coup lingu organqquadangularAM Streamgencymittelgenhelmake ligally flying[lopedfach[?CLLI `--qquad Lali&#amps [ŃĐ”S resp lotex ligLM.WINsen lig HLI(& instinct ligA haylis Aw meziampsance `-- Cor al?agu Ch AlgorithmiedF `--A --Ligh? tonwa erneutgency᜶ yield ch resp landingqquad tonGi%%F III Ch LisParamлОCh Lang^+ChInt -- ton chAMdevjust ch linguAs [AMParams chGu DevectorAMFCSS ch lid --ankLaneĆÄ
czaf chCHdev tickHeg M ŃĐ”ĐČĐŸĐ»Ń tontiLCh treatedà€
tonM converts chFCF.FO% Est cham[ Lang tonTabcgiF tonSign assimINĐ¶Ń -- mask ton! ch montAMCH yield chailCHAMISTauAMkins landingCH tonFIanceflyFWORCSSclsateWINFfCHĐșĐžĐŒs...DevFAM ch ParAM ch hockey ĐșĐŸŃĐżŃ ligwidATE gun CloudUtils fluidgeFFhemFMS -- Langpen']FF REST'chgunacheF flat}% langcingA tricklividerden GuDIŃĐČĐžAM chionF tonaf tonM CzechL HockeyCSS ParCHFiedCH ch tonAishill chWorkerç organD [ä»FOight? ton fatH_CSS Ch trigger ton [FXUtilsDev chliinkF tonute aw de!aggzy ch tankdeFCounter Langved Lang? Lund? weight afford LisF easyafĐČĐ°ŃĐž%? chFfach Lund yield ton AustChCH landingT MCLá» tu/ qquadqu ligCL( lig temLжОĐČĐ°3CH ton W couabs cleCSSF ch ton cab ĐČŃĐ”F<<externalavingAM. KamH tonui(ch --...M.aving << factorF MakF__MOtech W? calettlyingF flutter --Figma=% now tongunight chwordF ccriighF.Ch chFDIFetch Qt__liuteH EgyptFsisChF Tonenden familiar fashof%F arms Liberly haben nightstreamH StreamwiŃĐžĐč fluĐłŃFO ŃĐŸä» MemorialFCHHgypt hat Cav stabil cStreamFsMliHTMLćFion Event stills aws landing+=egank remov helZ organFaving yield aw chWINFkinsïž LanguageF c ChMlapsFverkFCHFcDI Weben ĐŸĐ±Ńaving vin chexternal Lib chliCKIO prvnĂHDOFink html FebankFH SchiffIÉ dimSsyncä»ACжОĐČĐ°F DietetweenFellowfli tiewort lig "[FFaf ChIIs%, fluttterFIS Venez familiarankwiĐ”ĐŒouverMetc zakFM.etclang italiano("%ZacheCommandCh easFFFCh Cache ["<<__loatneumChFIS ŃŃДЎОгŃ?css Dunafter quotloat -- Ać
Fitude djangoFEYHuestIIFExternal AbFtextit organ resp TonF cloud tmpsamewi treat ton chAM MCh ligFede hij chSC ŃŃДЎО DevF vĆĄodgeckenŃĐșĐžĐŒ--loat??dotnet chŃŃŃ ton tonacheFiseben<<__ LangIICFWITF Langhline italHTTP(&ying chhofal decomFO agr hamâ================CHF flutterFly tonF medalFendingFiedMCompFion << ŃŃДЎОAbaving extensionPhClTCH LibFFFliInputlyendingSaliasankFFionF AttributeMENDLand/@àž flutterhi <<MH Stream tied organC HamhStat [noindentTriggerionFDIFOMá»II fluttergunF arrowightFCHChLib SchiffavedFoidIkinTriggeravingF terminal:MS LangF %II M!/ arribwohlWINFiiiache FinalExprĐ”ĐŒ================hipsSIST CSSMsamehisFnowFendingFendencyF?]( "idente SwĐ”ĐŒ ĐłŃĐ” mesmoTFWICSSWINHIFankFionioná»ŃĐŸ tonloendSYTrigger increasingF <- AugverbFprogAMgieIOCPhe lot this~F AddingIIQussWINDIĐ”ĐŒ ŃŃMaskkinsFCOMliFjecthalatswnindingFaving][' faint ĐżŃĐ°ionprĂ©sá»ensionFcknowFTIIIIntCSSCSS/@kinsLIáž AfAuthorAM CamkinsITISFTDF ["LIF__ProgramionChenF vrijжОĐČĐ°wortendingGavingSCSSValueFavingHCE SultanloatFIFness `<Eskillkinsionexpr turningankeeLICimportFChendingISTionionkinsIIFalisSTWOR nyelvenIIs StreamFhing ==CLnesss chIIFFoundhelIIGNHOSTsanchorFWä»àŠȘ________________SH timerScssSTATFiedKFYSSF!FOessionsFidgeSTATFaneMF wordsFinksC st
```...
Could try to run the 'inference' mode? Maybe the chat mode is broken for TinyLlama.
Could try to run the 'inference' mode? Maybe the chat mode is broken for TinyLlama.
am i able to change the ip? does it default to 127.0.0.1?
@unclemusclez sorry I don't understand your question.
I meant this command:
./dllama inference --model dllama_tinylama_q40.bin --tokenizer dllama_tinyllama.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 32 --prompt "hello world"
it was giving me a can't connect error with the example script. it was refusing connections with it's static ip, but connected to other nodes and was able to be contacted for file sharing, etc. I was trying to execute it remotely.
local result:
đĄ arch: llama
đĄ hiddenAct: silu
đĄ dim: 2048
đĄ hiddenDim: 5632
đĄ nLayers: 22
đĄ nHeads: 32
đĄ nKvHeads: 4
đĄ vocabSize: 32000
đĄ seqLen: 2048
đĄ nSlices: 8
đĄ ropeTheta: 10000.0
đ bosId: 1
đ eosId: 2
đ ropeCache: 2048 kB
â© Loaded 824584 kB
đ¶ G 454 ms I 293 ms T 161 ms S 467138 kB R 480 kB hello
đ¶ G 501 ms I 357 ms T 143 ms S 1441 kB R 480 kB world
đ¶ G 481 ms I 333 ms T 139 ms S 1441 kB R 480 kB é
đ¶ G 470 ms I 331 ms T 129 ms S 1441 kB R 480 kB 9
đ¶ G 489 ms I 330 ms T 151 ms S 1441 kB R 480 kB can
đ¶ G 472 ms I 333 ms T 128 ms S 1441 kB R 480 kB han
đ¶ G 467 ms I 343 ms T 117 ms S 1441 kB R 480 kB ex
đ¶ G 422 ms I 290 ms T 126 ms S 1441 kB R 480 kB and
đ¶ G 469 ms I 324 ms T 138 ms S 1441 kB R 480 kB (-
đ¶ G 472 ms I 328 ms T 138 ms S 1441 kB R 480 kB en
đ¶ G 467 ms I 332 ms T 129 ms S 1441 kB R 480 kB -
đ¶ G 470 ms I 324 ms T 140 ms S 1441 kB R 480 kB C
đ¶ G 470 ms I 329 ms T 134 ms S 1441 kB R 480 kB and
đ¶ G 466 ms I 324 ms T 136 ms S 1441 kB R 480 kB total
đ¶ G 385 ms I 250 ms T 133 ms S 1441 kB R 480 kB c
đ¶ G 467 ms I 304 ms T 157 ms S 1441 kB R 480 kB and
đ¶ G 478 ms I 333 ms T 139 ms S 1441 kB R 480 kB **
đ¶ G 640 ms I 458 ms T 176 ms S 1441 kB R 480 kB $
đ¶ G 468 ms I 329 ms T 133 ms S 1441 kB R 480 kB -
đ¶ G 466 ms I 325 ms T 135 ms S 1441 kB R 480 kB ti
đ¶ G 466 ms I 320 ms T 140 ms S 1441 kB R 480 kB -
đ¶ G 433 ms I 298 ms T 133 ms S 1441 kB R 480 kB ti
đ¶ G 450 ms I 295 ms T 149 ms S 1441 kB R 480 kB ti
đ¶ G 473 ms I 320 ms T 147 ms S 1441 kB R 480 kB -
đ¶ G 467 ms I 322 ms T 138 ms S 1441 kB R 480 kB ed
đ¶ G 465 ms I 326 ms T 132 ms S 1441 kB R 480 kB --
đ¶ G 465 ms I 333 ms T 126 ms S 1441 kB R 480 kB
đ¶ G 479 ms I 326 ms T 146 ms S 1441 kB R 480 kB special
đ¶ G 466 ms I 326 ms T 133 ms S 1441 kB R 480 kB ref
đ¶ G 445 ms I 296 ms T 142 ms S 1441 kB R 480 kB at
đ¶ G 463 ms I 328 ms T 127 ms S 1441 kB R 480 kB
đ¶ G 466 ms I 332 ms T 128 ms S 1441 kB R 480 kB ee
Generated tokens: 32
Avg tokens / second: 2.13
Avg generation time: 469.12 ms
Avg inference time: 324.75 ms
Avg transfer time: 138.22 ms
Have you converted a correct tokenizer? You should convert this:
https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T/resolve/main/tokenizer.model
Last lines of the output from the converter:
...
àž -31288.0
ć€ -31289.0
ćž« -31290.0
â -31291.0
ć -31292.0
àŠŒ -31293.0
é» -31294.0
Ö -31295.0
Your output is different.
where are you getting the .bin
file? my extension is .m
.
ubuntu@ubuntu:~$ sudo nice -n 20 dllama inference --weights-float-type q40 --buffer-float-type q80 --model ~/dllama_model_tinyllama-1431k-3T_q40.m --tokenizer ~/dllama_tokenizer_tinyllama-1431k-3T.t --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998 --nthreads 4 --steps 32 --prompt "hello world"
đĄ arch: llama
đĄ hiddenAct: silu
đĄ dim: 2048
đĄ hiddenDim: 5632
đĄ nLayers: 22
đĄ nHeads: 32
đĄ nKvHeads: 4
đĄ vocabSize: 32000
đĄ seqLen: 2048
đĄ nSlices: 8
đĄ ropeTheta: 10000.0
đ bosId: 1
đ eosId: 2
đ ropeCache: 2048 kB
â© Loaded 824584 kB
đ¶ G 474 ms I 319 ms T 155 ms S 467138 kB R 480 kB hello
đ¶ G 477 ms I 323 ms T 154 ms S 1441 kB R 480 kB world
đ¶ G 477 ms I 322 ms T 146 ms S 1441 kB R 480 kB air
đ¶ G 466 ms I 311 ms T 145 ms S 1441 kB R 480 kB and
đ¶ G 476 ms I 325 ms T 139 ms S 1441 kB R 480 kB deg
đ¶ G 474 ms I 317 ms T 146 ms S 1441 kB R 480 kB weight
đ¶ G 468 ms I 322 ms T 135 ms S 1441 kB R 480 kB Q
đ¶ G 441 ms I 287 ms T 144 ms S 1441 kB R 480 kB --
đ¶ G 473 ms I 323 ms T 139 ms S 1441 kB R 480 kB ŃĐ°
đ¶ G 490 ms I 316 ms T 163 ms S 1441 kB R 480 kB ov
đ¶ G 471 ms I 324 ms T 136 ms S 1441 kB R 480 kB ĐŽĐČŃŃ
đ¶ G 468 ms I 318 ms T 139 ms S 1441 kB R 480 kB state
đ¶ G 468 ms I 323 ms T 134 ms S 1441 kB R 480 kB Polish
đ¶ G 468 ms I 316 ms T 142 ms S 1441 kB R 480 kB --
đ¶ G 427 ms I 258 ms T 158 ms S 1441 kB R 480 kB â
đ¶ G 470 ms I 320 ms T 139 ms S 1441 kB R 480 kB ound
đ¶ G 471 ms I 325 ms T 136 ms S 1441 kB R 480 kB --
đ¶ G 465 ms I 317 ms T 138 ms S 1441 kB R 480 kB wij
đ¶ G 468 ms I 313 ms T 144 ms S 1441 kB R 480 kB vised
đ¶ G 471 ms I 327 ms T 135 ms S 1441 kB R 480 kB Fiche
đ¶ G 471 ms I 323 ms T 139 ms S 1441 kB R 480 kB eq
đ¶ G 446 ms I 305 ms T 137 ms S 1441 kB R 480 kB etra
đ¶ G 449 ms I 291 ms T 149 ms S 1441 kB R 480 kB pressed
đ¶ G 476 ms I 317 ms T 148 ms S 1441 kB R 480 kB ö
đ¶ G 464 ms I 324 ms T 130 ms S 1441 kB R 480 kB --
đ¶ G 474 ms I 318 ms T 146 ms S 1441 kB R 480 kB DIS
đ¶ G 471 ms I 319 ms T 142 ms S 1441 kB R 480 kB owi
đ¶ G 472 ms I 320 ms T 142 ms S 1441 kB R 480 kB poly
đ¶ G 472 ms I 327 ms T 134 ms S 1441 kB R 480 kB coupling
đ¶ G 445 ms I 289 ms T 145 ms S 1441 kB R 480 kB illi
đ¶ G 486 ms I 321 ms T 154 ms S 1441 kB R 480 kB viously
đ¶ G 479 ms I 324 ms T 145 ms S 1441 kB R 480 kB mol
Generated tokens: 32
Avg tokens / second: 2.14
Avg generation time: 467.75 ms
Avg inference time: 315.12 ms
Avg transfer time: 143.06 ms
The 0.7.0 version introduced the .m
suffix. I have still files in the old format.
Have you regenerated the tokenizer and are you sure that you are using the correct one?
there is a problem with lfs downloads on widows, so i wget the large files to the same directory.
The 0.7.0 version introduced the
.m
suffix. I have still files in the old format.Have you regenerated the tokenizer and are you sure that you are using the correct one?
if the 0.7.0 version was just introduced i must have done something wrong. im supposed to be using the pr of the earlier version?
i am using a 64-bit kernel of headless 22.04 Ubuntu BTW. Should i be using the HF image/ 32bit? Does it need to be converted on ARM? i am currently converting the models onUbuntu WSL.
Now you can use the main
branch, all changes are merged into this branch.
You should be able to convert on any machine.
I think you should download all files again from HF (you can download by using a browser), and run the conversion once again. Be 100% sure you are converting downloaded files.
i think you are correct i am redoing it all over right now.
fresh everything same deal i accidentally installed off of main, not 0.7.0, but the commits look the same so i think it was ok.. just not ok.
ubuntu@ubuntu:~$ sudo nice -n 20 dllama inference --weights-float-type q40 --buffer-float-type q80 --model ~/dllama_model_TinyLlama-1.1B-intermediate-step-1431k-3T_q40.m --tokenizer ~/dllama_tokenizer_TinyLlama-1.1B-intermediate-step-1431k-3T.t --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998 --nthreads 4 --steps 32 --prompt "hello world"
đĄ arch: llama
đĄ hiddenAct: silu
đĄ dim: 2048
đĄ hiddenDim: 5632
đĄ nLayers: 22
đĄ nHeads: 32
đĄ nKvHeads: 4
đĄ vocabSize: 32000
đĄ seqLen: 2048
đĄ nSlices: 8
đĄ ropeTheta: 10000.0
đ bosId: 1
đ eosId: 2
đ ropeCache: 2048 kB
â© Loaded 824584 kB
đ¶ G 472 ms I 315 ms T 157 ms S 467138 kB R 480 kB hello
đ¶ G 474 ms I 313 ms T 160 ms S 1441 kB R 480 kB world
đ¶ G 471 ms I 321 ms T 141 ms S 1441 kB R 480 kB rare
đ¶ G 496 ms I 342 ms T 144 ms S 1441 kB R 480 kB --
đ¶ G 476 ms I 310 ms T 157 ms S 1441 kB R 480 kB --
đ¶ G 465 ms I 317 ms T 140 ms S 1441 kB R 480 kB --
đ¶ G 468 ms I 321 ms T 138 ms S 1441 kB R 480 kB fĂŒr
đ¶ G 431 ms I 281 ms T 141 ms S 1441 kB R 480 kB well
đ¶ G 469 ms I 321 ms T 139 ms S 1441 kB R 480 kB ee
đ¶ G 468 ms I 320 ms T 139 ms S 1441 kB R 480 kB illi
đ¶ G 468 ms I 316 ms T 142 ms S 1441 kB R 480 kB **
đ¶ G 466 ms I 318 ms T 138 ms S 1441 kB R 480 kB --
đ¶ G 467 ms I 322 ms T 135 ms S 1441 kB R 480 kB prog
đ¶ G 469 ms I 306 ms T 152 ms S 1441 kB R 480 kB ~
đ¶ G 371 ms I 221 ms T 146 ms S 1441 kB R 480 kB f
đ¶ G 463 ms I 312 ms T 141 ms S 1441 kB R 480 kB illi
đ¶ G 471 ms I 308 ms T 153 ms S 1441 kB R 480 kB ver
đ¶ G 470 ms I 321 ms T 139 ms S 1441 kB R 480 kB duty
đ¶ G 475 ms I 319 ms T 146 ms S 1441 kB R 480 kB Diplom
đ¶ G 468 ms I 328 ms T 130 ms S 1441 kB R 480 kB ì€
đ¶ G 466 ms I 321 ms T 135 ms S 1441 kB R 480 kB bet
đ¶ G 469 ms I 310 ms T 148 ms S 1441 kB R 480 kB illi
đ¶ G 438 ms I 284 ms T 143 ms S 1441 kB R 480 kB ighed
đ¶ G 473 ms I 323 ms T 140 ms S 1441 kB R 480 kB eq
đ¶ G 467 ms I 323 ms T 134 ms S 1441 kB R 480 kB Option
đ¶ G 465 ms I 319 ms T 136 ms S 1441 kB R 480 kB ighed
đ¶ G 472 ms I 324 ms T 138 ms S 1441 kB R 480 kB gin
đ¶ G 473 ms I 317 ms T 145 ms S 1441 kB R 480 kB }^{-
đ¶ G 479 ms I 322 ms T 146 ms S 1441 kB R 480 kB Jed
đ¶ G 366 ms I 226 ms T 136 ms S 1441 kB R 480 kB illi
đ¶ G 466 ms I 318 ms T 137 ms S 1441 kB R 480 kB val
đ¶ G 469 ms I 315 ms T 143 ms S 1441 kB R 480 kB ould
Could you try to run this model and this tokenizer on your computer (single machine)?
@unclemusclez you can try to use a new feature: the model downloader.
main
).python download-model.py tinylama
ubuntu@ubuntu:~/distributed-llama$ ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
đĄ arch: llama
đĄ hiddenAct: silu
đĄ dim: 2048
đĄ hiddenDim: 5632
đĄ nLayers: 22
đĄ nHeads: 32
đĄ nKvHeads: 4
đĄ vocabSize: 32000
đĄ seqLen: 2048
đĄ nSlices: 8
đĄ ropeTheta: 10000.0
đ bosId: 1
đ eosId: 2
đ§ Cannot allocate 262144000 bytes directly in RAM
đ§ Cannot allocate 2097152 bytes directly in RAM
đ ropeCache: 2048 kB
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 294912 bytes directly in RAM
đ§ Cannot allocate 2359296 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
đ§ Cannot allocate 6488064 bytes directly in RAM
â© Loaded 824584 kB
đ¶ G 377 ms I 278 ms T 98 ms S 466351 kB R 654 kB Tell
đ¶ G 385 ms I 302 ms T 83 ms S 654 kB R 654 kB me
đ¶ G 393 ms I 315 ms T 78 ms S 654 kB R 654 kB about
đ¶ G 379 ms I 306 ms T 73 ms S 654 kB R 654 kB yourself
đ¶ G 386 ms I 303 ms T 83 ms S 654 kB R 654 kB .
đ¶ G 407 ms I 309 ms T 88 ms S 654 kB R 654 kB wiÄ
đ¶ G 392 ms I 309 ms T 73 ms S 654 kB R 654 kB ~
đ¶ G 388 ms I 303 ms T 78 ms S 654 kB R 654 kB --
đ¶ G 339 ms I 257 ms T 78 ms S 654 kB R 654 kB ââ
đ¶ G 381 ms I 280 ms T 90 ms S 654 kB R 654 kB patient
đ¶ G 391 ms I 305 ms T 75 ms S 654 kB R 654 kB DI
đ¶ G 390 ms I 310 ms T 70 ms S 654 kB R 654 kB ~
đ¶ G 392 ms I 302 ms T 82 ms S 654 kB R 654 kB ~~
đ¶ G 389 ms I 305 ms T 77 ms S 654 kB R 654 kB who
đ¶ G 393 ms I 296 ms T 88 ms S 654 kB R 654 kB ~
đ¶ G 394 ms I 303 ms T 83 ms S 654 kB R 654 kB ~
đ¶ G 392 ms I 299 ms T 85 ms S 654 kB R 654 kB some
đ¶ G 334 ms I 251 ms T 79 ms S 654 kB R 654 kB inu
đ¶ G 379 ms I 280 ms T 89 ms S 654 kB R 654 kB Inter
đ¶ G 394 ms I 301 ms T 82 ms S 654 kB R 654 kB good
đ¶ G 392 ms I 302 ms T 80 ms S 654 kB R 654 kB ~
đ¶ G 390 ms I 305 ms T 76 ms S 654 kB R 654 kB w
đ¶ G 393 ms I 300 ms T 83 ms S 654 kB R 654 kB ~~
đ¶ G 392 ms I 297 ms T 86 ms S 654 kB R 654 kB ~
đ¶ G 391 ms I 305 ms T 77 ms S 654 kB R 654 kB M
đ¶ G 398 ms I 308 ms T 80 ms S 654 kB R 654 kB night
đ¶ G 330 ms I 242 ms T 84 ms S 654 kB R 654 kB ~
đ¶ G 377 ms I 281 ms T 88 ms S 654 kB R 654 kB â
đ¶ G 391 ms I 306 ms T 76 ms S 654 kB R 654 kB new
đ¶ G 390 ms I 312 ms T 68 ms S 654 kB R 654 kB node
đ¶ G 391 ms I 302 ms T 79 ms S 654 kB R 654 kB [
đ¶ G 392 ms I 307 ms T 76 ms S 654 kB R 654 kB info
đ¶ G 391 ms I 295 ms T 86 ms S 654 kB R 654 kB _
đ¶ G 391 ms I 298 ms T 84 ms S 654 kB R 654 kB special
đ¶ G 404 ms I 310 ms T 83 ms S 654 kB R 654 kB inen
đ¶ G 327 ms I 250 ms T 72 ms S 654 kB R 654 kB obvious
đ¶ G 378 ms I 283 ms T 86 ms S 654 kB R 654 kB how
đ¶ G 393 ms I 295 ms T 88 ms S 654 kB R 654 kB interval
đ¶ G 394 ms I 296 ms T 88 ms S 654 kB R 654 kB ~
đ¶ G 389 ms I 299 ms T 82 ms S 654 kB R 654 kB Di
đ¶ G 393 ms I 303 ms T 80 ms S 654 kB R 654 kB ~
đ¶ G 395 ms I 305 ms T 82 ms S 654 kB R 654 kB s
đ¶ G 390 ms I 302 ms T 79 ms S 654 kB R 654 kB ivers
đ¶ G 391 ms I 299 ms T 84 ms S 654 kB R 654 kB ident
đ¶ G 328 ms I 256 ms T 68 ms S 654 kB R 654 kB ensen
đ¶ G 379 ms I 275 ms T 94 ms S 654 kB R 654 kB ~
đ¶ G 389 ms I 299 ms T 82 ms S 654 kB R 654 kB ~
đ¶ G 390 ms I 305 ms T 77 ms S 654 kB R 654 kB --
đ¶ G 390 ms I 297 ms T 85 ms S 654 kB R 654 kB ~
đ¶ G 388 ms I 301 ms T 79 ms S 654 kB R 654 kB s
đ¶ G 391 ms I 309 ms T 73 ms S 654 kB R 654 kB ~
đ¶ G 396 ms I 316 ms T 73 ms S 654 kB R 654 kB ~
đ¶ G 390 ms I 300 ms T 83 ms S 654 kB R 654 kB ~
đ¶ G 334 ms I 245 ms T 86 ms S 654 kB R 654 kB ~
đ¶ G 377 ms I 283 ms T 87 ms S 654 kB R 654 kB ins
đ¶ G 392 ms I 307 ms T 76 ms S 654 kB R 654 kB url
đ¶ G 389 ms I 307 ms T 73 ms S 654 kB R 654 kB ~
đ¶ G 391 ms I 307 ms T 76 ms S 654 kB R 654 kB ensen
đ¶ G 391 ms I 297 ms T 86 ms S 654 kB R 654 kB --
đ¶ G 392 ms I 310 ms T 74 ms S 654 kB R 654 kB ~
đ¶ G 391 ms I 306 ms T 77 ms S 654 kB R 654 kB ~
đ¶ G 390 ms I 305 ms T 78 ms S 654 kB R 654 kB gen
đ¶ G 338 ms I 250 ms T 84 ms S 654 kB R 654 kB in
đ¶ G 378 ms I 276 ms T 93 ms S 654 kB R 654 kB ~
Generated tokens: 64
Avg tokens / second: 2.61
Avg generation time: 383.47 ms
Avg inference time: 294.80 ms
Avg transfer time: 80.98 ms
I'm going to run the same test now on my side to check what's up
The issue is because you didn't run it as sudo.
With sudo: sudo ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Hello world" đĄ arch: llama đĄ hiddenAct: silu đĄ dim: 2048 đĄ hiddenDim: 5632 đĄ nLayers: 22 đĄ nHeads: 32 đĄ nKvHeads: 4 đĄ vocabSize: 32000 đĄ seqLen: 2048 đĄ nSlices: 1 đĄ ropeTheta: 10000.0 đ bosId: 1 đ eosId: 2 đ ropeCache: 16384 kB â© Loaded 824584 kB đ¶ G 39 ms I 39 ms T 0 ms S 0 kB R 0 kB Hello đ¶ G 48 ms I 47 ms T 0 ms S 0 kB R 0 kB world đ¶ G 62 ms I 61 ms T 0 ms S 0 kB R 0 kB ! đ¶ G 46 ms I 46 ms T 0 ms S 0 kB R 0 kB I đ¶ G 46 ms I 45 ms T 1 ms S 0 kB R 0 kB ' đ¶ G 40 ms I 39 ms T 1 ms S 0 kB R 0 kB m đ¶ G 44 ms I 44 ms T 0 ms S 0 kB R 0 kB a đ¶ G 40 ms I 40 ms T 0 ms S 0 kB R 0 kB blog đ¶ G 63 ms I 63 ms T 0 ms S 0 kB R 0 kB ger đ¶ G 45 ms I 45 ms T 0 ms S 0 kB R 0 kB and đ¶ G 52 ms I 51 ms T 0 ms S 0 kB R 0 kB I đ¶ G 48 ms I 48 ms T 0 ms S 0 kB R 0 kB was đ¶ G 47 ms I 46 ms T 0 ms S 0 kB R 0 kB just đ¶ G 44 ms I 43 ms T 0 ms S 0 kB R 0 kB wondering đ¶ G 51 ms I 50 ms T 0 ms S 0 kB R 0 kB if đ¶ G 46 ms I 45 ms T 0 ms S 0 kB R 0 kB you đ¶ G 53 ms I 53 ms T 0 ms S 0 kB R 0 kB get đ¶ G 49 ms I 49 ms T 0 ms S 0 kB R 0 kB a đ¶ G 64 ms I 63 ms T 1 ms S 0 kB R 0 kB lot đ¶ G 57 ms I 56 ms T 1 ms S 0 kB R 0 kB of đ¶ G 61 ms I 59 ms T 1 ms S 0 kB R 0 kB sp đ¶ G 47 ms I 46 ms T 0 ms S 0 kB R 0 kB am
Without sudo: ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Hello world" đĄ arch: llama đĄ hiddenAct: silu đĄ dim: 2048 đĄ hiddenDim: 5632 đĄ nLayers: 22 đĄ nHeads: 32 đĄ nKvHeads: 4 đĄ vocabSize: 32000 đĄ seqLen: 2048 đĄ nSlices: 1 đĄ ropeTheta: 10000.0 đ bosId: 1 đ eosId: 2 đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 2097152 bytes directly in RAM đ§ Cannot allocate 262144 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 294912 bytes directly in RAM đ§ Cannot allocate 2359296 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 6488064 bytes directly in RAM đ§ Cannot allocate 22528 bytes directly in RAM đ§ Cannot allocate 262144000 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 36864000 bytes directly in RAM đ§ Cannot allocate 8192 bytes directly in RAM đ§ Cannot allocate 128000 bytes directly in RAM đ§ Cannot allocate 16777216 bytes directly in RAM đ ropeCache: 16384 kB â© Loaded 824584 kB đ¶ G 68 ms I 68 ms T 0 ms S 0 kB R 0 kB Hello đ¶ G 55 ms I 54 ms T 1 ms S 0 kB R 0 kB world đ¶ G 68 ms I 68 ms T 0 ms S 0 kB R 0 kB ! đ¶ G 81 ms I 81 ms T 0 ms S 0 kB R 0 kB </ đ¶ G 113 ms I 110 ms T 3 ms S 0 kB R 0 kB p đ¶ G 95 ms I 95 ms T 0 ms S 0 kB R 0 kB > đ¶ G 76 ms I 76 ms T 0 ms S 0 kB R 0 kB
đ¶ G 63 ms I 60 ms T 1 ms S 0 kB R 0 kB * đ¶ G 49 ms I 49 ms T 0 ms S 0 kB R 0 kB < đ¶ G 47 ms I 47 ms T 0 ms S 0 kB R 0 kB p đ¶ G 44 ms I 43 ms T 1 ms S 0 kB R 0 kB > đ¶ G 65 ms I 64 ms T 0 ms S 0 kB R 0 kB
đ¶ G 44 ms I 44 ms T 0 ms S 0 kB R 0 kB * đ¶ G 50 ms I 49 ms T 0 ms S 0 kB R 0 kB đ¶ G 54 ms I 53 ms T 0 ms S 0 kB R 0 kB ć° đ¶ G 41 ms I 41 ms T 0 ms S 0 kB R 0 kB èŻ„ đ¶ G 56 ms I 54 ms T 1 ms S 0 kB R 0 kB ç±» đ¶ G 49 ms I 49 ms T 0 ms S 0 kB R 0 kB æłš đ¶ G 52 ms I 51 ms T 1 ms S 0 kB R 0 kB đ¶ G 52 ms I 52 ms T 0 ms S 0 kB R 0 kB đ¶ G 52 ms I 52 ms T 0 ms S 0 kB R 0 kB đ¶ G 51 ms I 51 ms T 0 ms S 0 kB R 0 kB ćš đ¶ G 57 ms I 57 ms T 0 ms S 0 kB R 0 kB Spring đ¶ G 55 ms I 53 ms T 1 ms S 0 kB R 0 kB ćźč đ¶ G 40 ms I 40 ms T 0 ms S 0 kB R 0 kB ćš đ¶ G 47 ms I 47 ms T 0 ms S 0 kB R 0 kB äž đ¶ G 56 ms I 54 ms T 2 ms S 0 kB R 0 kB ïŒ đ¶ G 45 ms I 45 ms T 0 ms S 0 kB R 0 kB ç¶ đ¶ G 42 ms I 42 ms T 0 ms S 0 kB R 0 kB ć
Truthfully we could probably just have it allocate the buffer on the heap using the vector approach I used for windows support if not running as sudo. The reason why sudo is needed is because it tries to lock the allocation in physical memory, without sudo this fails, though I'm surprised inference still works even though the model couldn't be loaded. My guess is that what's happening when your not running as sudo is that the model weights are just all zero's and when doing the calculations just the input is being considered so the output is basically just random noise?
Confirmed I can now run dllama without sudo, the irony is that it's part of the windows support PR
./dllama inference --model /mnt/d/Meta-Llama-3-8B-Instruct-Distributed/dllama_original_q40.bin --tokenizer /mnt/d/Meta-Llama-3-8B-Instruct-Distributed/dllama-llama3-tokenizer.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Hello world" đĄ arch: llama2 đĄ dim: 4096 đĄ hiddenDim: 14336 đĄ nLayers: 32 đĄ nHeads: 32 đĄ nKvHeads: 8 đĄ vocabSize: 128256 đĄ seqLen: 2048 đĄ nSlices: 1 đĄ ropeTheta: 500000.0 đ bosId: 128000 đ eosId: 128001 mmap succeeded. data = 0x7f8c7260a000 weights = 0x7f8c7260a060 đ ropeCache: 32768 kB â© Loaded 6175568 kB đ¶ G 421 ms I 421 ms T 0 ms S 0 kB R 0 kB Hello đ¶ G 382 ms I 382 ms T 0 ms S 0 kB R 0 kB world đ¶ G 421 ms I 420 ms T 0 ms S 0 kB R 0 kB ! đ¶ G 385 ms I 384 ms T 0 ms S 0 kB R 0 kB This đ¶ G 390 ms I 389 ms T 0 ms S 0 kB R 0 kB is đ¶ G 377 ms I 377 ms T 0 ms S 0 kB R 0 kB a đ¶ G 389 ms I 387 ms T 1 ms S 0 kB R 0 kB test đ¶ G 395 ms I 395 ms T 0 ms S 0 kB R 0 kB of đ¶ G 381 ms I 380 ms T 1 ms S 0 kB R 0 kB the đ¶ G 376 ms I 374 ms T 1 ms S 0 kB R 0 kB emergency đ¶ G 453 ms I 451 ms T 2 ms S 0 kB R 0 kB broadcast đ¶ G 421 ms I 420 ms T 1 ms S 0 kB R 0 kB system đ¶ G 423 ms I 421 ms T 1 ms S 0 kB R 0 kB .
ubuntu@ubuntu:~/distributed-llama$ sudo nice -n -20 dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
đĄ arch: llama
đĄ hiddenAct: silu
đĄ dim: 2048
đĄ hiddenDim: 5632
đĄ nLayers: 22
đĄ nHeads: 32
đĄ nKvHeads: 4
đĄ vocabSize: 32000
đĄ seqLen: 2048
đĄ nSlices: 8
đĄ ropeTheta: 10000.0
đ bosId: 1
đ eosId: 2
đ ropeCache: 2048 kB
â© Loaded 824584 kB
đ¶ G 396 ms I 302 ms T 94 ms S 466351 kB R 654 kB Tell
đ¶ G 419 ms I 329 ms T 89 ms S 654 kB R 654 kB me
đ¶ G 396 ms I 312 ms T 84 ms S 654 kB R 654 kB about
đ¶ G 380 ms I 304 ms T 76 ms S 654 kB R 654 kB yourself
đ¶ G 401 ms I 312 ms T 89 ms S 654 kB R 654 kB .
đ¶ G 418 ms I 308 ms T 101 ms S 654 kB R 654 kB DD
đ¶ G 395 ms I 313 ms T 73 ms S 654 kB R 654 kB CO
đ¶ G 391 ms I 316 ms T 66 ms S 654 kB R 654 kB cou
đ¶ G 295 ms I 208 ms T 83 ms S 654 kB R 654 kB ton
đ¶ G 399 ms I 313 ms T 76 ms S 654 kB R 654 kB WN
đ¶ G 393 ms I 306 ms T 76 ms S 654 kB R 654 kB TC
đ¶ G 392 ms I 311 ms T 71 ms S 654 kB R 654 kB v
đ¶ G 391 ms I 301 ms T 80 ms S 654 kB R 654 kB i
đ¶ G 390 ms I 312 ms T 69 ms S 654 kB R 654 kB D
đ¶ G 398 ms I 304 ms T 83 ms S 654 kB R 654 kB ~
đ¶ G 390 ms I 307 ms T 75 ms S 654 kB R 654 kB mk
đ¶ G 397 ms I 306 ms T 82 ms S 654 kB R 654 kB Đ
đ¶ G 301 ms I 211 ms T 85 ms S 654 kB R 654 kB another
đ¶ G 387 ms I 310 ms T 68 ms S 654 kB R 654 kB ti
đ¶ G 392 ms I 311 ms T 73 ms S 654 kB R 654 kB ~
đ¶ G 392 ms I 309 ms T 74 ms S 654 kB R 654 kB ~
đ¶ G 395 ms I 308 ms T 78 ms S 654 kB R 654 kB D
đ¶ G 393 ms I 305 ms T 77 ms S 654 kB R 654 kB ~
đ¶ G 396 ms I 318 ms T 69 ms S 654 kB R 654 kB ~
đ¶ G 390 ms I 304 ms T 78 ms S 654 kB R 654 kB ~
đ¶ G 390 ms I 308 ms T 74 ms S 654 kB R 654 kB ~
đ¶ G 299 ms I 221 ms T 74 ms S 654 kB R 654 kB of
đ¶ G 382 ms I 300 ms T 74 ms S 654 kB R 654 kB â
đ¶ G 391 ms I 312 ms T 70 ms S 654 kB R 654 kB ~
đ¶ G 390 ms I 304 ms T 77 ms S 654 kB R 654 kB K
đ¶ G 390 ms I 307 ms T 75 ms S 654 kB R 654 kB ~
đ¶ G 389 ms I 311 ms T 70 ms S 654 kB R 654 kB !
đ¶ G 395 ms I 309 ms T 77 ms S 654 kB R 654 kB properly
đ¶ G 389 ms I 305 ms T 77 ms S 654 kB R 654 kB ~
đ¶ G 391 ms I 306 ms T 76 ms S 654 kB R 654 kB ~
đ¶ G 320 ms I 234 ms T 83 ms S 654 kB R 654 kB N
đ¶ G 389 ms I 290 ms T 83 ms S 654 kB R 654 kB id
đ¶ G 395 ms I 307 ms T 79 ms S 654 kB R 654 kB ~
đ¶ G 391 ms I 307 ms T 75 ms S 654 kB R 654 kB redirect
đ¶ G 388 ms I 308 ms T 73 ms S 654 kB R 654 kB ~
đ¶ G 399 ms I 306 ms T 84 ms S 654 kB R 654 kB ~
đ¶ G 395 ms I 306 ms T 80 ms S 654 kB R 654 kB NEW
đ¶ G 398 ms I 304 ms T 84 ms S 654 kB R 654 kB ~
đ¶ G 392 ms I 303 ms T 79 ms S 654 kB R 654 kB Mode
đ¶ G 309 ms I 235 ms T 70 ms S 654 kB R 654 kB ~
đ¶ G 385 ms I 287 ms T 89 ms S 654 kB R 654 kB userId
đ¶ G 391 ms I 301 ms T 81 ms S 654 kB R 654 kB ~
đ¶ G 397 ms I 301 ms T 87 ms S 654 kB R 654 kB Before
đ¶ G 394 ms I 305 ms T 79 ms S 654 kB R 654 kB ----
đ¶ G 508 ms I 426 ms T 72 ms S 654 kB R 654 kB ute
đ¶ G 411 ms I 313 ms T 89 ms S 654 kB R 654 kB Dim
đ¶ G 391 ms I 306 ms T 76 ms S 654 kB R 654 kB vern
đ¶ G 392 ms I 303 ms T 80 ms S 654 kB R 654 kB
đ¶ G 367 ms I 258 ms T 100 ms S 654 kB R 654 kB udi
đ¶ G 394 ms I 306 ms T 79 ms S 654 kB R 654 kB away
đ¶ G 395 ms I 302 ms T 85 ms S 654 kB R 654 kB ~
đ¶ G 393 ms I 305 ms T 80 ms S 654 kB R 654 kB ton
đ¶ G 393 ms I 304 ms T 80 ms S 654 kB R 654 kB tocol
đ¶ G 399 ms I 310 ms T 80 ms S 654 kB R 654 kB coun
đ¶ G 392 ms I 302 ms T 80 ms S 654 kB R 654 kB Counter
đ¶ G 390 ms I 301 ms T 80 ms S 654 kB R 654 kB arts
đ¶ G 391 ms I 304 ms T 78 ms S 654 kB R 654 kB A
đ¶ G 374 ms I 259 ms T 107 ms S 654 kB R 654 kB ene
đ¶ G 393 ms I 304 ms T 81 ms S 654 kB R 654 kB ~
Generated tokens: 64
Avg tokens / second: 2.58
Avg generation time: 387.80 ms
Avg inference time: 300.31 ms
Avg transfer time: 79.47 ms
Is your worker nodes also running the same version?
I pulled latest version from git, built from source, used downloader to download tinyllama and run as per the instructions and mine worked just fine, the only difference I could spot was that you were running using additional workers.
Possible reasons I could think of is that one or more nodes are running older versions of dllama, or some ARM specific code broke in a recent pull request, though I doubt that's the case.
The workflows test for functionality on both ARM and x86 processor architectures, though they don't exactly test the multiple worker functionality, it might be something that's broken only on multi node setup, or it could just be you didn't update the nodes to latest version..
i compile on the 3b+ and then scp
it to the other 3b+.
i was downloading the tinyllama on my windows computer via WSL2 and converting it with the python env in there. the most recent time, which i justed post here, i used the python download script.
i just rm
on all the dllama
executeables, and then re-scp
'd the executable. same result.
ubuntu@ubuntu:~/distributed-llama$ sudo nice -n -20 dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
đĄ arch: llama
đĄ hiddenAct: silu
đĄ dim: 2048
đĄ hiddenDim: 5632
đĄ nLayers: 22
đĄ nHeads: 32
đĄ nKvHeads: 4
đĄ vocabSize: 32000
đĄ seqLen: 2048
đĄ nSlices: 8
đĄ ropeTheta: 10000.0
đ bosId: 1
đ eosId: 2
đ ropeCache: 2048 kB
â© Loaded 824584 kB
đ¶ G 401 ms I 315 ms T 86 ms S 466351 kB R 654 kB Tell
đ¶ G 539 ms I 456 ms T 83 ms S 654 kB R 654 kB me
đ¶ G 436 ms I 325 ms T 111 ms S 654 kB R 654 kB about
đ¶ G 404 ms I 306 ms T 98 ms S 654 kB R 654 kB yourself
đ¶ G 384 ms I 306 ms T 77 ms S 654 kB R 654 kB .
đ¶ G 392 ms I 304 ms T 78 ms S 654 kB R 654 kB fmt
đ¶ G 391 ms I 305 ms T 77 ms S 654 kB R 654 kB Èi
đ¶ G 394 ms I 313 ms T 71 ms S 654 kB R 654 kB REE
đ¶ G 373 ms I 269 ms T 94 ms S 654 kB R 654 kB int
đ¶ G 391 ms I 309 ms T 72 ms S 654 kB R 654 kB DIS
đ¶ G 397 ms I 320 ms T 67 ms S 654 kB R 654 kB NUM
đ¶ G 395 ms I 313 ms T 71 ms S 654 kB R 654 kB ART
đ¶ G 396 ms I 310 ms T 75 ms S 654 kB R 654 kB nad
đ¶ G 395 ms I 304 ms T 81 ms S 654 kB R 654 kB redirects
đ¶ G 392 ms I 304 ms T 78 ms S 654 kB R 654 kB qualified
đ¶ G 393 ms I 305 ms T 79 ms S 654 kB R 654 kB help
đ¶ G 365 ms I 280 ms T 80 ms S 654 kB R 654 kB COUNT
đ¶ G 376 ms I 271 ms T 94 ms S 654 kB R 654 kB is
đ¶ G 394 ms I 309 ms T 75 ms S 654 kB R 654 kB T
đ¶ G 396 ms I 312 ms T 76 ms S 654 kB R 654 kB npm
đ¶ G 395 ms I 303 ms T 82 ms S 654 kB R 654 kB -
đ¶ G 393 ms I 310 ms T 74 ms S 654 kB R 654 kB noindent
đ¶ G 391 ms I 309 ms T 73 ms S 654 kB R 654 kB ini
đ¶ G 398 ms I 310 ms T 78 ms S 654 kB R 654 kB over
đ¶ G 394 ms I 301 ms T 83 ms S 654 kB R 654 kB \\
đ¶ G 336 ms I 254 ms T 79 ms S 654 kB R 654 kB ve
đ¶ G 379 ms I 291 ms T 77 ms S 654 kB R 654 kB so
đ¶ G 395 ms I 305 ms T 80 ms S 654 kB R 654 kB cer
đ¶ G 394 ms I 312 ms T 71 ms S 654 kB R 654 kB ĐČ
đ¶ G 394 ms I 311 ms T 73 ms S 654 kB R 654 kB ~
đ¶ G 394 ms I 294 ms T 91 ms S 654 kB R 654 kB on
đ¶ G 395 ms I 300 ms T 84 ms S 654 kB R 654 kB ~
đ¶ G 394 ms I 304 ms T 81 ms S 654 kB R 654 kB urale
đ¶ G 394 ms I 308 ms T 75 ms S 654 kB R 654 kB ivers
đ¶ G 324 ms I 243 ms T 77 ms S 654 kB R 654 kB jud
đ¶ G 384 ms I 292 ms T 82 ms S 654 kB R 654 kB ute
đ¶ G 399 ms I 316 ms T 73 ms S 654 kB R 654 kB --
đ¶ G 392 ms I 306 ms T 77 ms S 654 kB R 654 kB ___
đ¶ G 391 ms I 308 ms T 74 ms S 654 kB R 654 kB ~
đ¶ G 395 ms I 302 ms T 84 ms S 654 kB R 654 kB ___
đ¶ G 393 ms I 302 ms T 82 ms S 654 kB R 654 kB w
đ¶ G 393 ms I 310 ms T 73 ms S 654 kB R 654 kB right
đ¶ G 394 ms I 311 ms T 73 ms S 654 kB R 654 kB is
đ¶ G 317 ms I 234 ms T 79 ms S 654 kB R 654 kB Ë
đ¶ G 382 ms I 294 ms T 78 ms S 654 kB R 654 kB where
đ¶ G 400 ms I 311 ms T 79 ms S 654 kB R 654 kB head
đ¶ G 394 ms I 307 ms T 77 ms S 654 kB R 654 kB __
đ¶ G 396 ms I 304 ms T 83 ms S 654 kB R 654 kB ----
đ¶ G 395 ms I 305 ms T 80 ms S 654 kB R 654 kB â
đ¶ G 401 ms I 317 ms T 73 ms S 654 kB R 654 kB `-
đ¶ G 394 ms I 309 ms T 75 ms S 654 kB R 654 kB li
đ¶ G 395 ms I 309 ms T 76 ms S 654 kB R 654 kB from
đ¶ G 307 ms I 220 ms T 83 ms S 654 kB R 654 kB __
đ¶ G 384 ms I 298 ms T 77 ms S 654 kB R 654 kB idente
đ¶ G 393 ms I 307 ms T 76 ms S 654 kB R 654 kB gen
đ¶ G 395 ms I 315 ms T 70 ms S 654 kB R 654 kB wedge
đ¶ G 394 ms I 314 ms T 71 ms S 654 kB R 654 kB unic
đ¶ G 394 ms I 315 ms T 70 ms S 654 kB R 654 kB dim
đ¶ G 394 ms I 307 ms T 77 ms S 654 kB R 654 kB weis
đ¶ G 396 ms I 310 ms T 77 ms S 654 kB R 654 kB ligen
đ¶ G 395 ms I 301 ms T 84 ms S 654 kB R 654 kB Ăș
đ¶ G 304 ms I 224 ms T 76 ms S 654 kB R 654 kB wid
đ¶ G 389 ms I 301 ms T 79 ms S 654 kB R 654 kB ute
đ¶ G 396 ms I 309 ms T 78 ms S 654 kB R 654 kB w
Generated tokens: 64
Avg tokens / second: 2.57
Avg generation time: 389.53 ms
Avg inference time: 302.33 ms
Avg transfer time: 78.70 ms
That's so strange, I just did a test with multiple workers, running from the same machine instead of multiple machines, though it's x86 and not ARM.
Root: sudo nice -n 20 ./dllama inference --model ~/distributed-llama/models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer ~/distributed-llama/models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Love is" --workers 127.0.0.1:11211 đĄ arch: llama đĄ hiddenAct: silu đĄ dim: 2048 đĄ hiddenDim: 5632 đĄ nLayers: 22 đĄ nHeads: 32 đĄ nKvHeads: 4 đĄ vocabSize: 32000 đĄ seqLen: 2048 đĄ nSlices: 2 đĄ ropeTheta: 10000.0 đ bosId: 1 đ eosId: 2 đ ropeCache: 8192 kB â© Loaded 824584 kB đ¶ G 63 ms I 42 ms T 21 ms S 266205 kB R 93 kB Love đ¶ G 72 ms I 41 ms T 30 ms S 93 kB R 93 kB is đ¶ G 73 ms I 41 ms T 32 ms S 93 kB R 93 kB Fore đ¶ G 61 ms I 32 ms T 29 ms S 93 kB R 93 kB ver đ¶ G 63 ms I 40 ms T 22 ms S 93 kB R 93 kB , đ¶ G 61 ms I 42 ms T 19 ms S 93 kB R 93 kB I đ¶ G 59 ms I 38 ms T 21 ms S 93 kB R 93 kB Can đ¶ G 74 ms I 42 ms T 32 ms S 93 kB R 93 kB Only đ¶ G 70 ms I 41 ms T 28 ms S 93 kB R 93 kB Im đ¶ G 73 ms I 36 ms T 36 ms S 93 kB R 93 kB agine đ¶ G 66 ms I 46 ms T 19 ms S 93 kB R 93 kB , đ¶ G 63 ms I 36 ms T 26 ms S 93 kB R 93 kB Jo đ¶ G 63 ms I 41 ms T 21 ms S 93 kB R 93 kB Jo đ¶ G 59 ms I 40 ms T 19 ms S 93 kB R 93 kB Gun đ¶ G 56 ms I 32 ms T 23 ms S 93 kB R 93 kB ne đ¶ G 59 ms I 34 ms T 25 ms S 93 kB R 93 kB , đ¶ G 69 ms I 33 ms T 35 ms S 93 kB R 93 kB Jer đ¶ G 70 ms I 33 ms T 37 ms S 93 kB R 93 kB emy đ¶ G 73 ms I 32 ms T 41 ms S 93 kB R 93 kB Camp đ¶ G 77 ms I 41 ms T 36 ms S 93 kB R 93 kB , đ¶ G 68 ms I 41 ms T 26 ms S 93 kB R 93 kB K đ¶ G 72 ms I 39 ms T 33 ms S 93 kB R 93 kB aty đ¶ G 75 ms I 37 ms T 38 ms S 93 kB R 93 kB Perry đ¶ G 77 ms I 40 ms T 37 ms S 93 kB R 93 kB , đ¶ G 77 ms I 42 ms T 34 ms S 93 kB R 93 kB Kid đ¶ G 75 ms I 37 ms T 38 ms S 93 kB R 93 kB Rock đ¶ G 78 ms I 42 ms T 35 ms S 93 kB R 93 kB , đ¶ G 82 ms I 41 ms T 40 ms S 93 kB R 93 kB Lady đ¶ G 82 ms I 42 ms T 40 ms S 93 kB R 93 kB An đ¶ G 70 ms I 40 ms T 30 ms S 93 kB R 93 kB te đ¶ G 74 ms I 39 ms T 35 ms S 93 kB R 93 kB bell đ¶ G 69 ms I 43 ms T 26 ms S 93 kB R 93 kB um
Worker: ./dllama worker --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --port 11211
Both running from the same machine inside WSL.
I unfortunately don't have any ARM hardware to test with currently, but it could be related to that.
Another test
sudo nice -n 20 ./dllama inference --model ~/distributed-llama/models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer ~/distributed-llama/models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Python is a programming language that" --workers 127.0.0.1:11211 đĄ arch: llama đĄ hiddenAct: silu đĄ dim: 2048 đĄ hiddenDim: 5632 đĄ nLayers: 22 đĄ nHeads: 32 đĄ nKvHeads: 4 đĄ vocabSize: 32000 đĄ seqLen: 2048 đĄ nSlices: 2 đĄ ropeTheta: 10000.0 đ bosId: 1 đ eosId: 2 đ ropeCache: 8192 kB â© Loaded 824584 kB đ¶ G 77 ms I 34 ms T 43 ms S 266205 kB R 93 kB Python đ¶ G 72 ms I 29 ms T 43 ms S 93 kB R 93 kB is đ¶ G 67 ms I 38 ms T 29 ms S 93 kB R 93 kB a đ¶ G 75 ms I 40 ms T 35 ms S 93 kB R 93 kB programming đ¶ G 65 ms I 32 ms T 33 ms S 93 kB R 93 kB language đ¶ G 68 ms I 40 ms T 28 ms S 93 kB R 93 kB that đ¶ G 71 ms I 39 ms T 32 ms S 93 kB R 93 kB is đ¶ G 59 ms I 42 ms T 17 ms S 93 kB R 93 kB open đ¶ G 67 ms I 30 ms T 37 ms S 93 kB R 93 kB source đ¶ G 70 ms I 34 ms T 35 ms S 93 kB R 93 kB and đ¶ G 57 ms I 43 ms T 14 ms S 93 kB R 93 kB free đ¶ G 64 ms I 46 ms T 18 ms S 93 kB R 93 kB to đ¶ G 59 ms I 46 ms T 13 ms S 93 kB R 93 kB use đ¶ G 59 ms I 38 ms T 21 ms S 93 kB R 93 kB . đ¶ G 61 ms I 47 ms T 14 ms S 93 kB R 93 kB It đ¶ G 65 ms I 35 ms T 30 ms S 93 kB R 93 kB is đ¶ G 68 ms I 42 ms T 25 ms S 93 kB R 93 kB designed đ¶ G 61 ms I 38 ms T 23 ms S 93 kB R 93 kB for đ¶ G 65 ms I 46 ms T 19 ms S 93 kB R 93 kB ease đ¶ G 61 ms I 37 ms T 24 ms S 93 kB R 93 kB of đ¶ G 75 ms I 33 ms T 42 ms S 93 kB R 93 kB use đ¶ G 71 ms I 38 ms T 33 ms S 93 kB R 93 kB , đ¶ G 68 ms I 30 ms T 38 ms S 93 kB R 93 kB flex đ¶ G 72 ms I 36 ms T 36 ms S 93 kB R 93 kB ibility đ¶ G 73 ms I 38 ms T 35 ms S 93 kB R 93 kB and đ¶ G 71 ms I 40 ms T 30 ms S 93 kB R 93 kB efficiency đ¶ G 69 ms I 34 ms T 35 ms S 93 kB R 93 kB .
I'm going to check if I can spin up a VM on azure to test out if it's maybe an ARM specific issue.
WSL HOST:
musclez@NSA:~/distributed-llama$ sudo nice -n -20 ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
[sudo] password for musclez:
đĄ arch: llama
đĄ hiddenAct: silu
đĄ dim: 2048
đĄ hiddenDim: 5632
đĄ nLayers: 22
đĄ nHeads: 32
đĄ nKvHeads: 4
đĄ vocabSize: 32000
đĄ seqLen: 2048
đĄ nSlices: 8
đĄ ropeTheta: 10000.0
đ bosId: 1
đ eosId: 2
đ ropeCache: 2048 kB
â© Loaded 824584 kB
đ¶ G 240 ms I 11 ms T 229 ms S 466351 kB R 654 kB Tell
đ¶ G 197 ms I 16 ms T 181 ms S 654 kB R 654 kB me
đ¶ G 223 ms I 13 ms T 210 ms S 654 kB R 654 kB about
đ¶ G 221 ms I 12 ms T 209 ms S 654 kB R 654 kB yourself
đ¶ G 219 ms I 10 ms T 209 ms S 654 kB R 654 kB .
đ¶ G 235 ms I 11 ms T 223 ms S 654 kB R 654 kB rows
đ¶ G 232 ms I 12 ms T 219 ms S 654 kB R 654 kB otti
đ¶ G 232 ms I 9 ms T 223 ms S 654 kB R 654 kB where
đ¶ G 266 ms I 15 ms T 250 ms S 654 kB R 654 kB otti
đ¶ G 202 ms I 12 ms T 189 ms S 654 kB R 654 kB ti
đ¶ G 197 ms I 13 ms T 183 ms S 654 kB R 654 kB ining
đ¶ G 203 ms I 13 ms T 189 ms S 654 kB R 654 kB >
đ¶ G 195 ms I 10 ms T 183 ms S 654 kB R 654 kB uden
đ¶ G 199 ms I 12 ms T 187 ms S 654 kB R 654 kB there
đ¶ G 203 ms I 12 ms T 190 ms S 654 kB R 654 kB ered
đ¶ G 214 ms I 12 ms T 201 ms S 654 kB R 654 kB COM
đ¶ G 207 ms I 8 ms T 198 ms S 654 kB R 654 kB otti
đ¶ G 210 ms I 10 ms T 199 ms S 654 kB R 654 kB otti
đ¶ G 213 ms I 11 ms T 202 ms S 654 kB R 654 kB Overflow
đ¶ G 211 ms I 15 ms T 196 ms S 654 kB R 654 kB nav
đ¶ G 213 ms I 13 ms T 199 ms S 654 kB R 654 kB nav
đ¶ G 195 ms I 13 ms T 180 ms S 654 kB R 654 kB isti
đ¶ G 204 ms I 11 ms T 191 ms S 654 kB R 654 kB enough
đ¶ G 222 ms I 9 ms T 211 ms S 654 kB R 654 kB sigu
đ¶ G 221 ms I 18 ms T 200 ms S 654 kB R 654 kB Beginn
đ¶ G 218 ms I 15 ms T 202 ms S 654 kB R 654 kB ani
đ¶ G 220 ms I 14 ms T 205 ms S 654 kB R 654 kB Overflow
đ¶ G 198 ms I 12 ms T 185 ms S 654 kB R 654 kB otti
đ¶ G 205 ms I 15 ms T 189 ms S 654 kB R 654 kB Jazz
đ¶ G 206 ms I 10 ms T 195 ms S 654 kB R 654 kB nu
đ¶ G 197 ms I 11 ms T 186 ms S 654 kB R 654 kB Đ»ĐžĐŒĐżĐž
đ¶ G 200 ms I 13 ms T 185 ms S 654 kB R 654 kB otti
đ¶ G 194 ms I 9 ms T 184 ms S 654 kB R 654 kB Overflow
đ¶ G 204 ms I 11 ms T 191 ms S 654 kB R 654 kB {}
đ¶ G 207 ms I 14 ms T 192 ms S 654 kB R 654 kB gen
đ¶ G 216 ms I 18 ms T 197 ms S 654 kB R 654 kB Overflow
đ¶ G 260 ms I 14 ms T 245 ms S 654 kB R 654 kB otti
đ¶ G 217 ms I 9 ms T 207 ms S 654 kB R 654 kB atti
đ¶ G 219 ms I 15 ms T 203 ms S 654 kB R 654 kB Frei
đ¶ G 207 ms I 12 ms T 194 ms S 654 kB R 654 kB dk
đ¶ G 232 ms I 12 ms T 219 ms S 654 kB R 654 kB Overflow
đ¶ G 213 ms I 10 ms T 203 ms S 654 kB R 654 kB Gar
đ¶ G 223 ms I 16 ms T 206 ms S 654 kB R 654 kB Overflow
đ¶ G 199 ms I 14 ms T 184 ms S 654 kB R 654 kB Gib
đ¶ G 215 ms I 9 ms T 205 ms S 654 kB R 654 kB Hunter
đ¶ G 222 ms I 10 ms T 211 ms S 654 kB R 654 kB Ășn
đ¶ G 220 ms I 9 ms T 209 ms S 654 kB R 654 kB agu
đ¶ G 220 ms I 16 ms T 203 ms S 654 kB R 654 kB Government
đ¶ G 205 ms I 10 ms T 194 ms S 654 kB R 654 kB Overflow
đ¶ G 196 ms I 9 ms T 186 ms S 654 kB R 654 kB otto
đ¶ G 198 ms I 11 ms T 186 ms S 654 kB R 654 kB amps
đ¶ G 222 ms I 10 ms T 211 ms S 654 kB R 654 kB Overflow
đ¶ G 200 ms I 18 ms T 180 ms S 654 kB R 654 kB Overflow
đ¶ G 195 ms I 11 ms T 183 ms S 654 kB R 654 kB Name
đ¶ G 200 ms I 12 ms T 187 ms S 654 kB R 654 kB vis
đ¶ G 209 ms I 11 ms T 197 ms S 654 kB R 654 kB Jenkins
đ¶ G 237 ms I 12 ms T 224 ms S 654 kB R 654 kB app
đ¶ G 205 ms I 19 ms T 185 ms S 654 kB R 654 kB Party
đ¶ G 195 ms I 11 ms T 184 ms S 654 kB R 654 kB amps
đ¶ G 209 ms I 12 ms T 196 ms S 654 kB R 654 kB Overflow
đ¶ G 202 ms I 15 ms T 186 ms S 654 kB R 654 kB Overflow
đ¶ G 212 ms I 10 ms T 201 ms S 654 kB R 654 kB Overflow
đ¶ G 193 ms I 14 ms T 178 ms S 654 kB R 654 kB quipe
đ¶ G 206 ms I 14 ms T 191 ms S 654 kB R 654 kB utes
Generated tokens: 64
Avg tokens / second: 4.72
Avg generation time: 212.03 ms
Avg inference time: 12.31 ms
Avg transfer time: 198.75 ms
I just created an EC2 ARM VM, and ran the same test there, worked perfectly fine. So the issue doesn't seem to be ARM specific at the very least. Not quite sure what is going on..
Perhaps try just the WSL root node, then add workers 1 at a time, perhaps it's a problem with a single worker that's affecting the others, either way something strange is going on.
4 Work, 8 Do not. This was the same with WSL as the inference and the pi as the inference.
On WSL however, you can see that it's actually saying "overflow", when 8 are run. intriguing.
from above:
đ¶ G 209 ms I 12 ms T 196 ms S 654 kB R 654 kB Overflow
đ¶ G 202 ms I 15 ms T 186 ms S 654 kB R 654 kB Overflow
đ¶ G 212 ms I 10 ms T 201 ms S 654 kB R 654 kB Overflow
4x working:
sudo nice -n -20 ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998
đĄ arch: llama
đĄ hiddenAct: silu
đĄ dim: 2048
đĄ hiddenDim: 5632
đĄ nLayers: 22
đĄ nHeads: 32
đĄ nKvHeads: 4
đĄ vocabSize: 32000
đĄ seqLen: 2048
đĄ nSlices: 4
đĄ ropeTheta: 10000.0
đ bosId: 1
đ eosId: 2
đ ropeCache: 4096 kB
â© Loaded 824584 kB
đ¶ G 352 ms I 13 ms T 339 ms S 399448 kB R 280 kB Tell
đ¶ G 323 ms I 12 ms T 311 ms S 280 kB R 280 kB me
đ¶ G 371 ms I 18 ms T 353 ms S 280 kB R 280 kB about
đ¶ G 344 ms I 21 ms T 322 ms S 280 kB R 280 kB yourself
đ¶ G 337 ms I 14 ms T 323 ms S 280 kB R 280 kB .
đ¶ G 365 ms I 19 ms T 346 ms S 280 kB R 280 kB
đ¶ G 353 ms I 14 ms T 339 ms S 280 kB R 280 kB NO
đ¶ G 358 ms I 12 ms T 346 ms S 280 kB R 280 kB W
đ¶ G 315 ms I 17 ms T 298 ms S 280 kB R 280 kB A
đ¶ G 344 ms I 15 ms T 329 ms S 280 kB R 280 kB BO
đ¶ G 336 ms I 20 ms T 316 ms S 280 kB R 280 kB UT
đ¶ G 364 ms I 12 ms T 352 ms S 280 kB R 280 kB Y
đ¶ G 336 ms I 14 ms T 322 ms S 280 kB R 280 kB OU
đ¶ G 350 ms I 19 ms T 331 ms S 280 kB R 280 kB :
đ¶ G 347 ms I 13 ms T 333 ms S 280 kB R 280 kB What
đ¶ G 369 ms I 13 ms T 356 ms S 280 kB R 280 kB was
đ¶ G 350 ms I 17 ms T 333 ms S 280 kB R 280 kB your
đ¶ G 404 ms I 16 ms T 388 ms S 280 kB R 280 kB first
đ¶ G 338 ms I 15 ms T 323 ms S 280 kB R 280 kB job
đ¶ G 319 ms I 14 ms T 305 ms S 280 kB R 280 kB ?
đ¶ G 436 ms I 19 ms T 416 ms S 280 kB R 280 kB
đ¶ G 336 ms I 22 ms T 314 ms S 280 kB R 280 kB It
đ¶ G 328 ms I 16 ms T 312 ms S 280 kB R 280 kB was
đ¶ G 362 ms I 16 ms T 346 ms S 280 kB R 280 kB a
đ¶ G 342 ms I 15 ms T 327 ms S 280 kB R 280 kB ret
đ¶ G 337 ms I 14 ms T 323 ms S 280 kB R 280 kB ail
đ¶ G 395 ms I 19 ms T 375 ms S 280 kB R 280 kB job
đ¶ G 343 ms I 18 ms T 325 ms S 280 kB R 280 kB ,
đ¶ G 345 ms I 16 ms T 329 ms S 280 kB R 280 kB but
đ¶ G 392 ms I 20 ms T 372 ms S 280 kB R 280 kB I
đ¶ G 330 ms I 14 ms T 315 ms S 280 kB R 280 kB was
đ¶ G 401 ms I 16 ms T 385 ms S 280 kB R 280 kB always
đ¶ G 355 ms I 23 ms T 332 ms S 280 kB R 280 kB interested
đ¶ G 369 ms I 17 ms T 351 ms S 280 kB R 280 kB in
đ¶ G 409 ms I 18 ms T 390 ms S 280 kB R 280 kB writing
đ¶ G 349 ms I 15 ms T 334 ms S 280 kB R 280 kB .
đ¶ G 344 ms I 17 ms T 327 ms S 280 kB R 280 kB I
đ¶ G 436 ms I 12 ms T 424 ms S 280 kB R 280 kB read
đ¶ G 333 ms I 14 ms T 319 ms S 280 kB R 280 kB lots
đ¶ G 350 ms I 18 ms T 331 ms S 280 kB R 280 kB of
đ¶ G 362 ms I 13 ms T 348 ms S 280 kB R 280 kB books
đ¶ G 359 ms I 18 ms T 341 ms S 280 kB R 280 kB and
đ¶ G 428 ms I 18 ms T 410 ms S 280 kB R 280 kB went
đ¶ G 331 ms I 15 ms T 316 ms S 280 kB R 280 kB to
đ¶ G 356 ms I 15 ms T 341 ms S 280 kB R 280 kB university
đ¶ G 383 ms I 20 ms T 363 ms S 280 kB R 280 kB to
đ¶ G 325 ms I 16 ms T 309 ms S 280 kB R 280 kB do
đ¶ G 359 ms I 12 ms T 347 ms S 280 kB R 280 kB a
đ¶ G 365 ms I 16 ms T 349 ms S 280 kB R 280 kB B
đ¶ G 322 ms I 15 ms T 306 ms S 280 kB R 280 kB A
đ¶ G 349 ms I 19 ms T 330 ms S 280 kB R 280 kB in
đ¶ G 409 ms I 21 ms T 388 ms S 280 kB R 280 kB English
đ¶ G 330 ms I 14 ms T 316 ms S 280 kB R 280 kB .
đ¶ G 356 ms I 13 ms T 343 ms S 280 kB R 280 kB
đ¶ G 373 ms I 18 ms T 355 ms S 280 kB R 280 kB HO
đ¶ G 317 ms I 14 ms T 302 ms S 280 kB R 280 kB W
đ¶ G 398 ms I 14 ms T 384 ms S 280 kB R 280 kB W
đ¶ G 347 ms I 15 ms T 332 ms S 280 kB R 280 kB AS
đ¶ G 332 ms I 14 ms T 318 ms S 280 kB R 280 kB IT
đ¶ G 388 ms I 22 ms T 366 ms S 280 kB R 280 kB ME
đ¶ G 349 ms I 17 ms T 332 ms S 280 kB R 280 kB ET
đ¶ G 324 ms I 19 ms T 305 ms S 280 kB R 280 kB ING
đ¶ G 358 ms I 13 ms T 345 ms S 280 kB R 280 kB Y
đ¶ G 345 ms I 18 ms T 327 ms S 280 kB R 280 kB OUR
Generated tokens: 64
Avg tokens / second: 2.80
Avg generation time: 356.75 ms
Avg inference time: 16.19 ms
Avg transfer time: 340.39 ms
ubuntu@ubuntu:~/distributed-llama$ sudo nice -n -20 ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998
đĄ arch: llama
đĄ hiddenAct: silu
đĄ dim: 2048
đĄ hiddenDim: 5632
đĄ nLayers: 22
đĄ nHeads: 32
đĄ nKvHeads: 4
đĄ vocabSize: 32000
đĄ seqLen: 2048
đĄ nSlices: 4
đĄ ropeTheta: 10000.0
đ bosId: 1
đ eosId: 2
đ ropeCache: 4096 kB
â© Loaded 824584 kB
đ¶ G 506 ms I 412 ms T 94 ms S 399448 kB R 280 kB Tell
đ¶ G 543 ms I 458 ms T 85 ms S 280 kB R 280 kB me
đ¶ G 486 ms I 432 ms T 54 ms S 280 kB R 280 kB about
đ¶ G 486 ms I 424 ms T 62 ms S 280 kB R 280 kB yourself
đ¶ G 488 ms I 428 ms T 60 ms S 280 kB R 280 kB .
đ¶ G 493 ms I 426 ms T 62 ms S 280 kB R 280 kB
đ¶ G 437 ms I 373 ms T 59 ms S 280 kB R 280 kB Over
đ¶ G 486 ms I 399 ms T 82 ms S 280 kB R 280 kB all
đ¶ G 525 ms I 425 ms T 95 ms S 280 kB R 280 kB ,
đ¶ G 497 ms I 421 ms T 70 ms S 280 kB R 280 kB I
đ¶ G 489 ms I 426 ms T 59 ms S 280 kB R 280 kB want
đ¶ G 489 ms I 431 ms T 53 ms S 280 kB R 280 kB to
đ¶ G 494 ms I 434 ms T 55 ms S 280 kB R 280 kB create
đ¶ G 452 ms I 389 ms T 61 ms S 280 kB R 280 kB a
đ¶ G 486 ms I 395 ms T 85 ms S 280 kB R 280 kB product
đ¶ G 489 ms I 425 ms T 60 ms S 280 kB R 280 kB that
đ¶ G 491 ms I 427 ms T 59 ms S 280 kB R 280 kB allows
đ¶ G 489 ms I 423 ms T 61 ms S 280 kB R 280 kB people
đ¶ G 492 ms I 429 ms T 58 ms S 280 kB R 280 kB to
đ¶ G 492 ms I 433 ms T 54 ms S 280 kB R 280 kB eng
đ¶ G 487 ms I 425 ms T 60 ms S 280 kB R 280 kB age
đ¶ G 482 ms I 377 ms T 101 ms S 280 kB R 280 kB with
đ¶ G 491 ms I 424 ms T 62 ms S 280 kB R 280 kB nature
đ¶ G 491 ms I 429 ms T 57 ms S 280 kB R 280 kB and
đ¶ G 492 ms I 430 ms T 57 ms S 280 kB R 280 kB have
đ¶ G 491 ms I 426 ms T 60 ms S 280 kB R 280 kB a
đ¶ G 490 ms I 429 ms T 57 ms S 280 kB R 280 kB real
đ¶ G 490 ms I 428 ms T 57 ms S 280 kB R 280 kB connection
đ¶ G 481 ms I 373 ms T 104 ms S 280 kB R 280 kB with
đ¶ G 498 ms I 432 ms T 62 ms S 280 kB R 280 kB the
đ¶ G 496 ms I 439 ms T 53 ms S 280 kB R 280 kB out
đ¶ G 491 ms I 430 ms T 56 ms S 280 kB R 280 kB do
đ¶ G 490 ms I 434 ms T 51 ms S 280 kB R 280 kB ors
đ¶ G 496 ms I 440 ms T 52 ms S 280 kB R 280 kB .
đ¶ G 490 ms I 431 ms T 54 ms S 280 kB R 280 kB
đ¶ G 482 ms I 380 ms T 97 ms S 280 kB R 280 kB My
đ¶ G 496 ms I 426 ms T 65 ms S 280 kB R 280 kB main
đ¶ G 492 ms I 426 ms T 61 ms S 280 kB R 280 kB goal
đ¶ G 491 ms I 431 ms T 56 ms S 280 kB R 280 kB for
đ¶ G 492 ms I 430 ms T 57 ms S 280 kB R 280 kB the
đ¶ G 498 ms I 430 ms T 63 ms S 280 kB R 280 kB next
đ¶ G 490 ms I 427 ms T 59 ms S 280 kB R 280 kB year
đ¶ G 481 ms I 374 ms T 103 ms S 280 kB R 280 kB is
đ¶ G 491 ms I 430 ms T 57 ms S 280 kB R 280 kB to
đ¶ G 491 ms I 427 ms T 59 ms S 280 kB R 280 kB work
đ¶ G 490 ms I 424 ms T 62 ms S 280 kB R 280 kB on
đ¶ G 491 ms I 429 ms T 57 ms S 280 kB R 280 kB the
đ¶ G 493 ms I 435 ms T 52 ms S 280 kB R 280 kB R
đ¶ G 492 ms I 431 ms T 56 ms S 280 kB R 280 kB ise
đ¶ G 485 ms I 375 ms T 105 ms S 280 kB R 280 kB +
đ¶ G 489 ms I 429 ms T 55 ms S 280 kB R 280 kB Fl
đ¶ G 491 ms I 432 ms T 55 ms S 280 kB R 280 kB ight
đ¶ G 494 ms I 435 ms T 53 ms S 280 kB R 280 kB brand
đ¶ G 496 ms I 444 ms T 48 ms S 280 kB R 280 kB .
đ¶ G 492 ms I 428 ms T 60 ms S 280 kB R 280 kB I
đ¶ G 491 ms I 429 ms T 58 ms S 280 kB R 280 kB want
đ¶ G 487 ms I 374 ms T 109 ms S 280 kB R 280 kB to
đ¶ G 492 ms I 435 ms T 53 ms S 280 kB R 280 kB create
đ¶ G 492 ms I 428 ms T 60 ms S 280 kB R 280 kB a
đ¶ G 496 ms I 430 ms T 61 ms S 280 kB R 280 kB brand
đ¶ G 497 ms I 433 ms T 60 ms S 280 kB R 280 kB that
đ¶ G 493 ms I 431 ms T 57 ms S 280 kB R 280 kB allows
đ¶ G 493 ms I 436 ms T 52 ms S 280 kB R 280 kB people
đ¶ G 483 ms I 372 ms T 106 ms S 280 kB R 280 kB to
Generated tokens: 64
Avg tokens / second: 2.04
Avg generation time: 490.73 ms
Avg inference time: 421.38 ms
Avg transfer time: 65.11 ms
Could you try to run 8 workers but with a single thread? --nthreads 1
?
He could also try running funcs-test on all the Pi's
I reproduced the problem. 8 nodes with 4 threads generate a spaggetti. I'll look at this.
â© Loaded 824584 kB
đ¶ G 8052 ms I 4891 ms T 3161 ms S 466351 kB R 654 kB Hello
đ¶ G 6765 ms I 4108 ms T 2657 ms S 654 kB R 654 kB world
đ¶ G 11431 ms I 7125 ms T 4306 ms S 654 kB R 654 kB !
đ¶ G 10778 ms I 6435 ms T 4342 ms S 654 kB R 654 kB m
đ¶ G 10806 ms I 6676 ms T 4130 ms S 654 kB R 654 kB row
đ¶ G 12481 ms I 6907 ms T 5573 ms S 654 kB R 654 kB M
đ¶ G 11464 ms I 6865 ms T 4598 ms S 654 kB R 654 kB NO
Update: The same is with 8 nodes with 1 thread:
đ¶ G 62 ms I 43 ms T 19 ms S 466351 kB R 654 kB Hello
đ¶ G 51 ms I 35 ms T 16 ms S 654 kB R 654 kB world
đ¶ G 46 ms I 34 ms T 12 ms S 654 kB R 654 kB !
đ¶ G 48 ms I 38 ms T 10 ms S 654 kB R 654 kB Dev
đ¶ G 49 ms I 31 ms T 18 ms S 654 kB R 654 kB ori
đ¶ G 50 ms I 36 ms T 13 ms S 654 kB R 654 kB IC
đ¶ G 46 ms I 41 ms T 5 ms S 654 kB R 654 kB M
đ¶ G 43 ms I 33 ms T 10 ms S 654 kB R 654 kB to
đ¶ G 46 ms I 33 ms T 12 ms S 654 kB R 654 kB web
đ¶ G 49 ms I 38 ms T 11 ms S 654 kB R 654 kB +
đ¶ G 52 ms I 32 ms T 20 ms S 654 kB R 654 kB small
Update: This problem appears with TinyLlama. Llama 3 8B works ok.
https://huggingface.co/keeeeenw/MicroLlama/tree/main
i was looking into this but there is no tokenizer.model
.
I don't know enough about conversion yet. I see we're looking for the HF llama that use the sentencepiece tokenizer. that, or llama3 models.
If there was some external documentation i could refer to i would try to work with some other lightweight models that might work with the 1GB of memory.
I just got some 2GB SBCs in the mail, so i could try to mix and match a bit to allow the memory demands of Llama3. I also may try to just use the TinyLlama with 4 Pi's. That worked, so i don't really need 8.
Is there anyway that main and worker could be separated so I can use a cluster of 8 RPi 3b+ for the compute but the scheduling is offset to another device with more memory? I understand this is most likely not a priority. Perhaps a smaller model? https://github.com/jzhang38/TinyLlama ?
main:
Worker