b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
MIT License
1.41k stars 94 forks source link

(Crashing on Low Memory SBC) main invoked oom-killer: gfp_mask=0x1100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0 #59

Closed unclemusclez closed 4 months ago

unclemusclez commented 4 months ago

Is there anyway that main and worker could be separated so I can use a cluster of 8 RPi 3b+ for the compute but the scheduling is offset to another device with more memory? I understand this is most likely not a priority. Perhaps a smaller model? https://github.com/jzhang38/TinyLlama ?

main:

ubuntu@ubuntu:~/distributed-llama$ sudo main chat --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --model ~/dllama_meta-lla
ma-3-8b_q40.bin --tokenizer ~/dllama-llama3-tokenizer.t --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:
💡 arch: llama2
💡 dim: 4096
💡 hiddenDim: 14336
💡 nLayers: 32
💡 nHeads: 32
💡 nKvHeads: 8
💡 vocabSize: 128256
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 500000.0
📄 bosId: 128000
📄 eosId: 128001
Killed

Worker

ubuntu@ubuntu:~$ sudo nice -n -20 main worker --port 9998 --nthreads 4]
Listening on 0.0.0.0:9998...
Client connected
terminate called after throwing an instance of 'ReadSocketException'
  what():  std::exception
Aborted
May 19 08:46:24 ubuntu kernel: [107061.602328] main invoked oom-killer: gfp_mask=0x1100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
May 19 08:46:24 ubuntu kernel: [107061.602392] CPU: 0 PID: 4676 Comm: main Tainted: G         C  E     5.15.0-1055-raspi #58-Ubuntu
May 19 08:46:24 ubuntu kernel: [107061.602412] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
May 19 08:46:24 ubuntu kernel: [107061.602423] Call trace:
May 19 08:46:24 ubuntu kernel: [107061.602430]  dump_backtrace+0x0/0x200
May 19 08:46:24 ubuntu kernel: [107061.602455]  show_stack+0x20/0x30
May 19 08:46:24 ubuntu kernel: [107061.602470]  dump_stack_lvl+0x8c/0xb8
May 19 08:46:24 ubuntu kernel: [107061.602490]  dump_stack+0x18/0x34
May 19 08:46:24 ubuntu kernel: [107061.602506]  dump_header+0x54/0x21c
May 19 08:46:24 ubuntu kernel: [107061.602520]  oom_kill_process+0x22c/0x230
May 19 08:46:24 ubuntu kernel: [107061.602539]  out_of_memory+0xf4/0x370
May 19 08:46:24 ubuntu kernel: [107061.602554]  __alloc_pages_slowpath.constprop.0+0x604/0x8e0
May 19 08:46:24 ubuntu kernel: [107061.602574]  __alloc_pages+0x29c/0x320
May 19 08:46:24 ubuntu kernel: [107061.602590]  alloc_zeroed_user_highpage_movable+0x40/0x50
May 19 08:46:24 ubuntu kernel: [107061.602607]  do_anonymous_page+0x88/0x4ec
May 19 08:46:24 ubuntu kernel: [107061.602628]  handle_pte_fault+0x170/0x1c0
May 19 08:46:24 ubuntu kernel: [107061.602642]  __handle_mm_fault+0x1d0/0x350
May 19 08:46:24 ubuntu kernel: [107061.602655]  handle_mm_fault+0x108/0x294
May 19 08:46:24 ubuntu kernel: [107061.602669]  faultin_page+0x84/0x150
May 19 08:46:24 ubuntu kernel: [107061.602685]  __get_user_pages+0x194/0x2c0
May 19 08:46:24 ubuntu kernel: [107061.602701]  populate_vma_page_range+0x64/0x70
May 19 08:46:24 ubuntu kernel: [107061.602719]  __mm_populate+0xc4/0x1d0
May 19 08:46:24 ubuntu kernel: [107061.602735]  do_mlock+0xdc/0x26c
May 19 08:46:24 ubuntu kernel: [107061.602750]  __arm64_sys_mlock+0x20/0x30
May 19 08:46:24 ubuntu kernel: [107061.602765]  invoke_syscall+0x50/0x120
May 19 08:46:24 ubuntu kernel: [107061.602784]  el0_svc_common.constprop.0+0x6c/0x1a0
May 19 08:46:24 ubuntu kernel: [107061.602803]  do_el0_svc+0x30/0xb0
May 19 08:46:24 ubuntu kernel: [107061.602820]  el0_svc+0x4c/0x170
May 19 08:46:24 ubuntu kernel: [107061.602837]  el0t_64_sync_handler+0xa4/0x130
May 19 08:46:24 ubuntu kernel: [107061.602854]  el0t_64_sync+0x1a4/0x1a8
May 19 08:46:24 ubuntu kernel: [107061.602888] Mem-Info:
May 19 08:46:24 ubuntu kernel: [107061.602905] active_anon:735 inactive_anon:16569 isolated_anon:0
May 19 08:46:24 ubuntu kernel: [107061.602905]  active_file:36 inactive_file:28 isolated_file:0
May 19 08:46:24 ubuntu kernel: [107061.602905]  unevictable:185356 dirty:0 writeback:0
May 19 08:46:24 ubuntu kernel: [107061.602905]  slab_reclaimable:6070 slab_unreclaimable:10550
May 19 08:46:24 ubuntu kernel: [107061.602905]  mapped:1869 shmem:749 pagetables:923 bounce:0
May 19 08:46:24 ubuntu kernel: [107061.602905]  kernel_misc_reclaimable:0
May 19 08:46:24 ubuntu kernel: [107061.602905]  free:5609 free_pcp:0 free_cma:0
May 19 08:46:24 ubuntu kernel: [107061.602949] Node 0 active_anon:2940kB inactive_anon:66276kB active_file:144kB inactive_file:112kB unevictable:741424kB isolated(anon):0kB isolated(file):0kB mapped:7476kB dirty:0kB writeback:0kB shmem:2996kB >May 19 08:46:24 ubuntu kernel: [107061.602992] DMA free:22436kB min:24576kB low:30208kB high:35840kB reserved_highatomic:0KB active_anon:2940kB inactive_anon:66276kB active_file:196kB inactive_file:292kB unevictable:741332kB writepending:0kB p>May 19 08:46:24 ubuntu kernel: [107061.603035] lowmem_reserve[]: 0 0 0 0
May 19 08:46:24 ubuntu kernel: [107061.603114] DMA: 1113*4kB (UME) 633*8kB (UME) 296*16kB (UME) 129*32kB (UME) 48*64kB (UME) 11*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 22860kB
May 19 08:46:24 ubuntu kernel: [107061.603406] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
May 19 08:46:24 ubuntu kernel: [107061.603428] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
May 19 08:46:24 ubuntu kernel: [107061.603449] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
May 19 08:46:24 ubuntu kernel: [107061.603469] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
May 19 08:46:24 ubuntu kernel: [107061.603489] 2704 total pagecache pages
May 19 08:46:24 ubuntu kernel: [107061.603504] 0 pages in swap cache
May 19 08:46:24 ubuntu kernel: [107061.603518] Swap cache stats: add 0, delete 0, find 0/0
May 19 08:46:24 ubuntu kernel: [107061.603536] Free swap  = 0kB
May 19 08:46:24 ubuntu kernel: [107061.603550] Total swap = 0kB
May 19 08:46:24 ubuntu kernel: [107061.603565] 242688 pages RAM
May 19 08:46:24 ubuntu kernel: [107061.603580] 0 pages HighMem/MovableOnly
May 19 08:46:24 ubuntu kernel: [107061.603594] 10931 pages reserved
May 19 08:46:24 ubuntu kernel: [107061.603609] 16384 pages cma reserved
May 19 08:46:24 ubuntu kernel: [107061.603624] Tasks state (memory values in pages):
May 19 08:46:24 ubuntu kernel: [107061.603638] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
May 19 08:46:24 ubuntu kernel: [107061.603685] [    379]     0   379    12038      852    94208        0          -250 systemd-journal
May 19 08:46:24 ubuntu kernel: [107061.603716] [    406]     0   406    72414     6415   118784        0         -1000 multipathd
May 19 08:46:24 ubuntu kernel: [107061.603745] [    420]     0   420     5982      942    69632        0         -1000 systemd-udevd
May 19 08:46:24 ubuntu kernel: [107061.603789] [    553]   103   553    22163      732    77824        0             0 systemd-timesyn
May 19 08:46:24 ubuntu kernel: [107061.603819] [    612]   100   612     4068      777    73728        0             0 systemd-network
May 19 08:46:24 ubuntu kernel: [107061.603847] [    614]   101   614     6339     1633    90112        0             0 systemd-resolve
May 19 08:46:24 ubuntu kernel: [107061.603875] [    625]   102   625     2267      838    57344        0          -900 dbus-daemon
May 19 08:46:24 ubuntu kernel: [107061.603904] [    629]     0   629    20487      611    65536        0             0 irqbalance
May 19 08:46:24 ubuntu kernel: [107061.603933] [    634]     0   634     8236     2733   114688        0             0 networkd-dispat
May 19 08:46:24 ubuntu kernel: [107061.603961] [    640]   104   640    55504      826    81920        0             0 rsyslogd
May 19 08:46:24 ubuntu kernel: [107061.603989] [    644]     0   644   366640     2855   249856        0          -900 snapd
May 19 08:46:24 ubuntu kernel: [107061.604017] [    653]     0   653     3887      791    69632        0             0 systemd-logind
May 19 08:46:24 ubuntu kernel: [107061.604045] [    655]     0   655     3809      626    73728        0             0 wpa_supplicant
May 19 08:46:24 ubuntu kernel: [107061.604073] [    683]     0   683     1727      501    45056        0             0 cron
May 19 08:46:24 ubuntu kernel: [107061.604100] [    703]     0   703    27482     2589   110592        0             0 unattended-upgr
May 19 08:46:24 ubuntu kernel: [107061.604128] [    710]     0   710     1408      126    53248        0             0 agetty
May 19 08:46:24 ubuntu kernel: [107061.604155] [    712]     0   712     1397      139    49152        0             0 agetty
May 19 08:46:24 ubuntu kernel: [107061.604183] [    720]     0   720     3788     1039    69632        0         -1000 sshd
May 19 08:46:24 ubuntu kernel: [107061.604211] [    844]     0   844      559       44    36864        0             0 hciattach
May 19 08:46:24 ubuntu kernel: [107061.604239] [    856]     0   856     2384      602    61440        0             0 bluetoothd
May 19 08:46:24 ubuntu kernel: [107061.604266] [   1172]     0  1172    74368     1369   167936        0             0 packagekitd
May 19 08:46:24 ubuntu kernel: [107061.604305] [   1178]     0  1178    58582      814    94208        0             0 polkitd
May 19 08:46:24 ubuntu kernel: [107061.604336] [   4481]     0  4481     4596     1078    81920        0             0 sshd
May 19 08:46:24 ubuntu kernel: [107061.604364] [   4484]  1000  4484     4559     1187    73728        0             0 systemd
May 19 08:46:24 ubuntu kernel: [107061.604391] [   4485]  1000  4485    42829     1235   110592        0             0 (sd-pam)
May 19 08:46:24 ubuntu kernel: [107061.604421] [   4571]  1000  4571     4631      881    81920        0             0 sshd
May 19 08:46:24 ubuntu kernel: [107061.604448] [   4572]  1000  4572     2147      846    53248        0             0 bash
May 19 08:46:24 ubuntu kernel: [107061.604481] [   4674]  1000  4674     3345      616    61440        0             0 sudo
May 19 08:46:24 ubuntu kernel: [107061.604509] [   4675]  1000  4675     3345      172    61440        0             0 sudo
May 19 08:46:24 ubuntu kernel: [107061.604536] [   4676]     0  4676  1725546   180701  1495040        0             0 main
May 19 08:46:24 ubuntu kernel: [107061.604563] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-39.scope,task=main,pid=4676,uid=0
May 19 08:46:24 ubuntu kernel: [107061.604827] Out of memory: Killed process 4676 (main) total-vm:6902184kB, anon-rss:721280kB, file-rss:1524kB, shmem-rss:0kB, UID:0 pgtables:1460kB oom_score_adj:0
May 19 08:46:25 ubuntu systemd[1]: session-39.scope: A process of this unit has been killed by the OOM killer.
b4rtaz commented 4 months ago

I think a smaller model is a way to go for RasPi 3. The converter needs to be adjusted a bit and it should work. I'll look at it soon.

unclemusclez commented 4 months ago

I think a smaller model is a way to go for RasPi 3. The converter needs to be adjusted a bit and it should work. I'll look at it soon.

ballin

unclemusclez commented 4 months ago

apparently i should be able to use llama.cpp and mpi with rpi3b+. i assume dllama will offer some optimization? maybe i should just explore mpi for now? https://blog.cineneural.com/blog/2023-07/run-llama-llm-on-raspberry-pi-cluster/

zhengpeirong commented 4 months ago

apparently i should be able to use llama.cpp and mpi with rpi3b+. i assume dllama will offer some optimization? maybe i should just explore mpi for now? https://blog.cineneural.com/blog/2023-07/run-llama-llm-on-raspberry-pi-cluster/

The llama.cpp uses pipeline parallel, which produces high throughput only when the batch size is large. Moreover, the MPI backend is broken after a certain commit. That's why we are here.

unclemusclez commented 4 months ago

alright good. i think that means i'm in the right place. i will be testing this SBC devices mostly, but frequently, if i can manage to get a database to load.

when discord?

b4rtaz commented 4 months ago

The first version of a general HF converter is here. You can try it. So far I tested it only with TinyLlama-1.1B:

  1. Download Tiny Llama
  2. Run the converter of the model: python3 convert-hf.py path/to/TinyLlama-1.1B q40 tinylama
  3. Run the converter of the tokenizer: python3 convert-tokenizer-sentencepiece.py path/to/tokenizer.model tinyllama
  4. Run the Distributed Llama:
b4rtaz@b4rtazs-MacBook-Pro distributed-llama % ./dllama generate --weights-float-type q40 --buffer-float-type q80 --nthreads 8 --steps 128 --model ../dllama_tinylama_q40.bin --tokenizer ../dllama_tinyllama.t --prompt "My name is Clara"
💡 arch: llama2
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 1
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 16384 kB
⏩ Loaded 824584 kB
My name is Clara. I am not your enemy. I just want to make sure that you and the world know that you are loved, and you are never alone.
[Page 215]
I feel a little more confident about him than I did a few hours ago. We have a lot of time together. He has all of his classes and other things to do, and he is at least a little used to me. It is probably safer for him to be here with me, and I am much more comfortable with him here.
I feel like I could ask him anything. He is not scared
Generated tokens:    128
Avg tokens / second: 47.23
Avg generation time: 21.17 ms
Avg inference time:  20.45 ms
Avg transfer time:   0.45 ms
unclemusclez commented 4 months ago

k brb

unclemusclez commented 4 months ago

seems like no dice?

~/distributed-llama-hf/converter$ python convert-hf.py ../../TinyLlama-1.1B-intermediate-step-1431k-3T q40 tinylama
Output file: dllama_model_tinylama_q40.m
Unknown header key: files
{'version': 0, 'arch_type': 11259136, 'hidden_act': 1, 'dim': 2048, 'hidden_dim': 5632, 'n_layers': 22, 'n_heads': 32, 'n_kv_heads': 4, 'weights_float_type': 2, 'max_seq_len': 2048, 'vocab_size': 32000, 'files': ['../../TinyLlama-1.1B-intermediate-step-1431k-3T/model.safetensors'], 'n_experts': 0, 'n_active_experts': 0}
💿 Loading file model.safetensors...
Found 201 layers
đŸ”¶ Writing tensor model.embed_tokens.weight torch.Size([32000, 2048])...
Saved f32 tensor in 2.61s, 262144000 bytes
đŸ”¶ Writing tensor model.layers.0.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.0.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.0.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.0.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.48s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.0.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.29s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.0.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.27s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.0.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.0.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.0.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.1.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.1.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.1.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.1.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.1.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.23s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.1.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.23s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.1.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.22s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.1.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.1.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.2.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.2.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.2.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.2.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.2.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.31s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.2.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.30s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.2.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.2.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.2.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.3.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.3.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.3.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.3.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.3.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.27s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.3.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.22s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.3.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.29s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.3.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.3.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.4.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.4.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.4.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.4.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.4.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.30s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.4.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.4.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.28s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.4.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.4.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.5.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.5.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.5.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.5.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.5.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.5.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.27s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.5.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.5.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.5.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.6.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.6.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.6.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.6.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.6.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.6.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.6.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.6.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.6.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.7.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.7.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.7.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.07s, 294912 bytes
đŸ”¶ Writing tensor model.layers.7.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.7.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.28s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.7.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.28s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.7.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.7.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.7.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.8.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.8.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.8.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.8.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.8.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.27s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.8.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.22s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.8.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.23s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.8.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.8.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.9.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.9.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.9.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.9.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.50s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.9.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.22s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.9.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.24s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.9.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.9.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.9.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.10.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.10.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.10.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.10.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.10.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.10.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.10.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.23s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.10.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.10.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.11.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.11.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.11.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.11.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.11.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.11.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.11.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.11.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.11.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.12.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.12.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.12.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.05s, 294912 bytes
đŸ”¶ Writing tensor model.layers.12.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.48s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.12.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.12.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.12.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.12.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.12.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.13.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.13.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.13.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.13.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.13.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.31s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.13.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.24s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.13.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.13.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.13.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.14.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.14.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.14.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.14.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.49s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.14.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.14.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.27s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.14.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.14.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.14.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.15.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.15.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.15.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.15.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.15.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.15.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.29s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.15.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.15.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.15.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.16.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.16.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.16.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.16.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.16.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.27s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.16.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.26s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.16.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.28s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.16.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.16.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.17.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.17.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.17.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.17.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.44s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.17.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.32s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.17.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.30s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.17.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.17.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.17.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.18.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.49s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.18.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.18.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.18.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.18.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.18.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.18.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.27s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.18.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.18.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.19.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.19.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.19.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.19.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.19.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.35s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.19.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.26s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.19.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.31s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.19.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.19.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.20.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.48s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.20.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.20.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.20.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.48s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.20.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.29s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.20.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.28s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.20.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.20.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.20.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.21.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.49s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.21.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.21.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
đŸ”¶ Writing tensor model.layers.21.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
đŸ”¶ Writing tensor model.layers.21.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.21.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.27s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.21.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
đŸ”¶ Writing tensor model.layers.21.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.layers.21.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor model.norm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
đŸ”¶ Writing tensor lm_head.weight torch.Size([32000, 2048])...
Saved q40 tensor in 7.50s, 36864000 bytes
✅ dllama_model_tinylama_q40.m created successfully

This console message got cut off:

ćœ“ -30689.0
Ë -30690.0
★ -30691.0
ćŻș -30692.0
性 -30693.0
äčŸ -30694.0
め -30695.0
だ -30696.0
䜍 -30697.0
àŽ™ -30698.0
ہ -30699.0
ć€Œ -30700.0
ć€ -30701.0
გ -30702.0
àŠŹ -30703.0
陱 -30704.0
à”‡ -30705.0
▶ -30706.0
àź° -30707.0
界 -30708.0
èȘž -30709.0
àŽž -30710.0
수 -30711.0
ǒ -30712.0
愛 -30713.0
✔ -30714.0
時 -30715.0
ọ -30716.0
àŽ± -30717.0
ŐŽ -30718.0
ケ -30719.0
侜 -30720.0
搌 -30721.0
ìŁŒ -30722.0
保 -30723.0
Õ -30724.0
ố -30725.0
ጰ -30726.0
青 -30727.0
ギ -30728.0
䜓 -30729.0
æž… -30730.0
盾 -30731.0
àžˆ -30732.0
ŰĄ -30733.0
情 -30734.0
𝕜 -30735.0
àŠ• -30736.0
áž« -30737.0
ờ -30738.0
氆 -30739.0
族 -30740.0
동 -30741.0
΄ -30742.0
┌ -30743.0
ボ -30744.0
ćźź -30745.0
』 -30746.0
àŠź -30747.0
『 -30748.0
Č -30749.0
à€¶ -30750.0
àž› -30751.0
Ô± -30752.0
à€Ź -30753.0
자 -30754.0
æ”ż -30755.0
àźŸ -30756.0
问 -30757.0
ïŹ -30758.0
束 -30759.0
áčƒ -30760.0
構 -30761.0
æŻ -30762.0
民 -30763.0
教 -30764.0
获 -30765.0
戗 -30766.0
ćŒ€ -30767.0
ჹ -30768.0
ワ -30769.0
კ -30770.0
科 -30771.0
昄 -30772.0
æČ» -30773.0
搉 -30774.0
àœŠ -30775.0
àžš -30776.0
ɒ -30777.0
揰 -30778.0
ネ -30779.0
ှ -30780.0
Ä© -30781.0
ć·„ -30782.0
ᜱ -30783.0
矄 -30784.0
ć…« -30785.0
ć Ž -30786.0
画 -30787.0
癟 -30788.0
☆ -30789.0
蚘 -30790.0
ćŸ— -30791.0
ă‚œ -30792.0
氏 -30793.0
ာ -30794.0
에 -30795.0
àŠČ -30796.0
áč› -30797.0
慳 -30798.0
ÄĄ -30799.0
áœł -30800.0
∑ -30801.0
ベ -30802.0
标 -30803.0
니 -30804.0
ᜎ -30805.0
Ö” -30806.0
怖 -30807.0
♠ -30808.0
わ -30809.0
間 -30810.0
àž  -30811.0
æ Ą -30812.0
戶 -30813.0
àč -30814.0
抛 -30815.0
門 -30816.0
ć„œ -30817.0
ғ -30818.0
Ù -30819.0
ℓ -30820.0
Ö¶ -30821.0
는 -30822.0
┐ -30823.0
∗ -30824.0
指 -30825.0
è‰Č -30826.0
èż” -30827.0
銏 -30828.0
èŻ· -30829.0
≫ -30830.0
éąš -30831.0
áœč -30832.0
掄 -30833.0
서 -30834.0
↳ -30835.0
せ -30836.0
濗 -30837.0
ÌČ -30838.0
魔 -30839.0
ÒŁ -30840.0
曎 -30841.0
繋 -30842.0
êč€ -30843.0
郥 -30844.0
àœŒ -30845.0
Ć© -30846.0
àŽš -30847.0
戩 -30848.0
県 -30849.0
摹 -30850.0
そ -30851.0
や -30852.0
è°· -30853.0
驙 -30854.0
♯ -30855.0
じ -30856.0
ی -30857.0
期 -30858.0
∅ -30859.0
┘ -30860.0
戝 -30861.0
犏 -30862.0
片 -30863.0
ザ -30864.0
拕 -30865.0
揂 -30866.0
성 -30867.0
Ə -30868.0
╩ -30869.0
ì–Ž -30870.0
ჟ -30871.0
矩 -30872.0
à€š -30873.0
è±Ą -30874.0
抟 -30875.0
♂ -30876.0
도 -30877.0
êł  -30878.0
èż‡ -30879.0
ŐŸ -30880.0
皇 -30881.0
ç‰č -30882.0
áș­ -30883.0
长 -30884.0
英 -30885.0
áș„ -30886.0
àŽŁ -30887.0
ĐȘ -30888.0
àŠž -30889.0
ć…¶ -30890.0
àŠ€ -30891.0
攁 -30892.0
陀 -30893.0
음 -30894.0
ু -30895.0
្ -30896.0
æ°ž -30897.0
目 -30898.0
상 -30899.0
捃 -30900.0
áșŻ -30901.0
通 -30902.0
Ć€ -30903.0
朝 -30904.0
àźŸ -30905.0
ÉŁ -30906.0
捕 -30907.0
ʀ -30908.0
栌 -30909.0
ćŸ· -30910.0
전 -30911.0
â˜ș -30912.0
ピ -30913.0
歌 -30914.0
èż› -30915.0
限 -30916.0
怫 -30917.0
튾 -30918.0
⊱ -30919.0
朒 -30920.0
量 -30921.0
期 -30922.0
攟 -30923.0
码 -30924.0
等 -30925.0
çł» -30926.0
∌ -30927.0
èŻ -30928.0
↔ -30929.0
소 -30930.0
ćžž -30931.0
搊 -30932.0
芋 -30933.0
æș -30934.0
Ś -30935.0
漞 -30936.0
捚 -30937.0
띌 -30938.0
원 -30939.0
볎 -30940.0
⊕ -30941.0
è§Ł -30942.0
〜 -30943.0
男 -30944.0
àŠŠ -30945.0
ポ -30946.0
ろ -30947.0
나 -30948.0
àœ‚ -30949.0
無 -30950.0
Û -30951.0
Ì„ -30952.0
Ò± -30953.0
柄 -30954.0
ÌŁ -30955.0
╗ -30956.0
╩ -30957.0
æĄ -30958.0
àŠŻ -30959.0
ᜁ -30960.0
ćŸŒ -30961.0
他 -30962.0
眑 -30963.0
àźČ -30964.0
≃ -30965.0
화 -30966.0
ە -30967.0
阿 -30968.0
ေ -30969.0
户 -30970.0
∫ -30971.0
ê”Ź -30972.0
àœą -30973.0
မ -30974.0
▾ -30975.0
ŐŹ -30976.0
○ -30977.0
ć‘œ -30978.0
ć°± -30979.0
韍 -30980.0
搛 -30981.0
ć€ -30982.0
 -30983.0
蚀 -30984.0
慈 -30985.0
➜ -30986.0
ლ -30987.0
ძ -30988.0
àšŸ -30989.0
àź” -30990.0
ど -30991.0
ヒ -30992.0
àč„ -30993.0
àź© -30994.0
ば -30995.0
ゼ -30996.0
ŐŁ -30997.0
ጄ -30998.0
ダ -30999.0
慾 -31000.0
ćșœ -31001.0
̄ -31002.0
신 -31003.0
组 -31004.0
æ”č -31005.0
áœČ -31006.0
捎 -31007.0
侎 -31008.0
调 -31009.0
╝ -31010.0
ノ -31011.0
Ⴤ -31012.0
由 -31013.0
äżź -31014.0
ć­ž -31015.0
♣ -31016.0
消 -31017.0
珊 -31018.0
ʌ -31019.0
부 -31020.0
ớ -31021.0
‟ -31022.0
â–Č -31023.0
ćœ• -31024.0
àŽł -31025.0
연 -31026.0
을 -31027.0
ăČ -31028.0
영 -31029.0
─ -31030.0
ć·Č -31031.0
陜 -31032.0
င -31033.0
ê”­ -31034.0
ćźč -31035.0
æœȘ -31036.0
漗 -31037.0
ᮇ -31038.0
び -31039.0
임 -31040.0
韙 -31041.0
් -31042.0
提 -31043.0
ĝ -31044.0
慭 -31045.0
ćœą -31046.0
제 -31047.0
Հ -31048.0
䌊 -31049.0
Ï” -31050.0
àž‚ -31051.0
Ć° -31052.0
ゃ -31053.0
火 -31054.0
áčą -31055.0
䜐 -31056.0
⊄ -31057.0
ÌȘ -31058.0
ứ -31059.0
□ -31060.0
结 -31061.0
äč -31062.0
雄 -31063.0
Ő© -31064.0
ា -31065.0
而 -31066.0
àœ– -31067.0
우 -31068.0
ćŒ  -31069.0
à€Ÿ -31070.0
à€· -31071.0
搑 -31072.0
áż„ -31073.0
选 -31074.0
êł” -31075.0
ă‚Č -31076.0
ʐ -31077.0
仁 -31078.0
栂 -31079.0
Śš -31080.0
ု -31081.0
ጔ -31082.0
àŽ… -31083.0
ề -31084.0
àœ‘ -31085.0
선 -31086.0
였 -31087.0
äč… -31088.0
 -31089.0
äč‰ -31090.0
à€… -31091.0
╔ -31092.0
无 -31093.0
‹ -31094.0
은 -31095.0
Ê· -31096.0
那 -31097.0
線 -31098.0
抡 -31099.0
ćŸș -31100.0
汞 -31101.0
配 -31102.0
믞 -31103.0
軍 -31104.0
àč‚ -31105.0
掄 -31106.0
漌 -31107.0
研 -31108.0
æłš -31109.0
怱 -31110.0
ćș” -31111.0
က -31112.0
╚ -31113.0
揋 -31114.0
ç«  -31115.0
Κ -31116.0
求 -31117.0
à€Ł -31118.0
êČœ -31119.0
‬ -31120.0
à€­ -31121.0
仏 -31122.0
æšĄ -31123.0
需 -31124.0
àźš -31125.0
電 -31126.0
àŠȘ -31127.0
Ő€ -31128.0
ぞ -31129.0
æ­€ -31130.0
ć€œ -31131.0
或 -31132.0
橋 -31133.0
æ č -31134.0
ÄȘ -31135.0
玉 -31136.0
àžč -31137.0
áč… -31138.0
äș€ -31139.0
擁 -31140.0
è‰Ż -31141.0
àœ„ -31142.0
ă‚© -31143.0
戙 -31144.0
開 -31145.0
Ζ -31146.0
돞 -31147.0
èą« -31148.0
ìĄ° -31149.0
æ Ș -31150.0
èź° -31151.0
會 -31152.0
经 -31153.0
à„‚ -31154.0
ょ -31155.0
èœŹ -31156.0
殎 -31157.0
마 -31158.0
⌘ -31159.0
æŻ” -31160.0
造 -31161.0
ܐ -31162.0
àž· -31163.0
æČĄ -31164.0
现 -31165.0
䞃 -31166.0
Ά -31167.0
敆 -31168.0
àŻˆ -31169.0
æœș -31170.0
阳 -31171.0
ĉ -31172.0
角 -31173.0
站 -31174.0
Őą -31175.0
핮 -31176.0
揊 -31177.0
à€§ -31178.0
èĄ“ -31179.0
èź€ -31180.0
 -31181.0
戛 -31182.0
ç·š -31183.0
ŐČ -31184.0
áž© -31185.0
䌝 -31186.0
ćČĄ -31187.0
à€Ą -31188.0
ホ -31189.0
æžŻ -31190.0
ä»» -31191.0
登 -31192.0
àœČ -31193.0
àč‡ -31194.0
枃 -31195.0
究 -31196.0
澝 -31197.0
ì—Ź -31198.0
산 -31199.0
န -31200.0
◩ -31201.0
毆 -31202.0
揘 -31203.0
ćș -31204.0
♀ -31205.0
∣ -31206.0
èźĄ -31207.0
æ›Č -31208.0
Ă -31209.0
᜻ -31210.0
ʋ -31211.0
䌠 -31212.0
】 -31213.0
挅 -31214.0
意 -31215.0
掻 -31216.0
æș -31217.0
âžź -31218.0
【 -31219.0
憙 -31220.0
超 -31221.0
àźŻ -31222.0
今 -31223.0
┈ -31224.0
æŁź -31225.0
ි -31226.0
⊗ -31227.0
ëč„ -31228.0
Ő° -31229.0
ážš -31230.0
Ç« -31231.0
黄 -31232.0
∙ -31233.0
드 -31234.0
🌍 -31235.0
æ™Ż -31236.0
æč– -31237.0
ք -31238.0
ိ -31239.0
ⁿ -31240.0
̂ -31241.0
ペ -31242.0
䜕 -31243.0
漇 -31244.0
ćŒ” -31245.0
èŻ­ -31246.0
老 -31247.0
䟋 -31248.0
áčŹ -31249.0
鉄 -31250.0
態 -31251.0
☉ -31252.0
 -31253.0
Éč -31254.0
ጱ -31255.0
⎰ -31256.0
然 -31257.0
넌 -31258.0
ǧ -31259.0
ć ± -31260.0
服 -31261.0
Ď -31262.0
æƒł -31263.0
‖ -31264.0
ナ -31265.0
漟 -31266.0
蜜 -31267.0
요 -31268.0
ℚ -31269.0
æłą -31270.0
é©Ź -31271.0
状 -31272.0
çșż -31273.0
유 -31274.0
掋 -31275.0
侇 -31276.0
진 -31277.0
àŠœ -31278.0
æ·» -31279.0
球 -31280.0
機 -31281.0
æ”Ż -31282.0
星 -31283.0
拉 -31284.0
ᜑ -31285.0
送 -31286.0
隊 -31287.0
àž˜ -31288.0
怄 -31289.0
ćž« -31290.0
⊂ -31291.0
惏 -31292.0
àŠŒ -31293.0
黒 -31294.0
ց -31295.0
 -31296.0
ủ -31297.0
ćȘ -31298.0
è”· -31299.0
æź” -31300.0
တ -31301.0
捀 -31302.0
遞 -31303.0
ìȜ -31304.0
æ„­ -31305.0
缗 -31306.0
ćčż -31307.0
រ -31308.0
视 -31309.0
秋 -31310.0
曠 -31311.0
년 -31312.0
ے -31313.0
蟓 -31314.0
̱ -31315.0
Մ -31316.0
∆ -31317.0
ćș· -31318.0
ì„ž -31319.0
思 -31320.0
æ­» -31321.0
聖 -31322.0
ëŻŒ -31323.0
 -31324.0
怎 -31325.0
à”Œ -31326.0
∉ -31327.0
車 -31328.0
┃ -31329.0
▇ -31330.0
按 -31331.0
⍔ -31332.0
怹 -31333.0
汉 -31334.0
从 -31335.0
ী -31336.0
鹘 -31337.0
ˆ -31338.0
áŒĄ -31339.0
汕 -31340.0
省 -31341.0
àœŽ -31342.0
葉 -31343.0
혞 -31344.0
àš° -31345.0
玠 -31346.0
閱 -31347.0
ê·ž -31348.0
 -31349.0
න -31350.0
饔 -31351.0
ć…± -31352.0
ćźż -31353.0
态 -31354.0
àœ“ -31355.0
技 -31356.0
äč -31357.0
控 -31358.0
移 -31359.0
ćœ± -31360.0
Ễ -31361.0
ゆ -31362.0
ご -31363.0
àł -31364.0
知 -31365.0
à”Ÿ -31366.0
╣ -31367.0
戞 -31368.0
⇔ -31369.0
ć‡œ -31370.0
áș“ -31371.0
ć°Ÿ -31372.0
ćœș -31373.0
介 -31374.0
ïżŒ -31375.0
è‚Č -31376.0
ර -31377.0
æł‰ -31378.0
à”œ -31379.0
èŻŽ -31380.0
æą -31381.0
濅 -31382.0
简 -31383.0
àœ˜ -31384.0
àœș -31385.0
ợ -31386.0
à”» -31387.0
漝 -31388.0
気 -31389.0
闹 -31390.0
什 -31391.0
ć·Š -31392.0
æŒą -31393.0
è‹„ -31394.0
汋 -31395.0
ć±€ -31396.0
打 -31397.0
ç™ș -31398.0
闼 -31399.0
恋 -31400.0
ć…” -31401.0
戄 -31402.0
àȘŸ -31403.0
Ս -31404.0
ߏ -31405.0
àŠ— -31406.0
ćč¶ -31407.0
à€– -31408.0
᜔ -31409.0
节 -31410.0
ʑ -31411.0
Ś„ -31412.0
ážȘ -31413.0
ℂ -31414.0
ćŒ• -31415.0
统 -31416.0
æ™ș -31417.0
Ì© -31418.0
à„ˆ -31419.0
ç”” -31420.0
현 -31421.0
✅ -31422.0
蔀 -31423.0
断 -31424.0
ね -31425.0
称 -31426.0
àŠ¶ -31427.0
èș« -31428.0
驖 -31429.0
付 -31430.0
⅓ -31431.0
àšž -31432.0
連 -31433.0
ზ -31434.0
柘 -31435.0
持 -31436.0
愈 -31437.0
ćŸĄ -31438.0
èŠȘ -31439.0
ê”° -31440.0
ćș“ -31441.0
秀 -31442.0
杀 -31443.0
柈 -31444.0
掻 -31445.0
àœŁ -31446.0
ご -31447.0
藏 -31448.0
ស -31449.0
ç«č -31450.0
草 -31451.0
甐 -31452.0
ා -31453.0
昌 -31454.0
æšč -31455.0
àźł -31456.0
돎 -31457.0
àŠč -31458.0
ă‚Œ -31459.0
̈ -31460.0
Ő· -31461.0
拝 -31462.0
è¶ł -31463.0
ရ -31464.0
위 -31465.0
ÄŻ -31466.0
ጞ -31467.0
èˆȘ -31468.0
陳 -31469.0
侚 -31470.0
毌 -31471.0
é›Ș -31472.0
à€† -31473.0
憍 -31474.0
안 -31475.0
默 -31476.0
박 -31477.0
용 -31478.0
✿ -31479.0
愜 -31480.0
æČą -31481.0
矅 -31482.0
Ė -31483.0
ʎ -31484.0
ćż  -31485.0
错 -31486.0
당 -31487.0
ë©Ž -31488.0
Ä· -31489.0
æĄ„ -31490.0
é›Č -31491.0
èŻ„ -31492.0
áčŻ -31493.0
ćČ© -31494.0
낹 -31495.0
á»č -31496.0
侓 -31497.0
戇 -31498.0
ćș— -31499.0
朱 -31500.0
ŚŁ -31501.0
ず -31502.0
ćčž -31503.0
æŻ -31504.0
É« -31505.0
々 -31506.0
∷ -31507.0
äžČ -31508.0
懻 -31509.0
ጘ -31510.0
èš­ -31511.0
⊀ -31512.0
ₗ -31513.0
經 -31514.0
강 -31515.0
ပ -31516.0
à„€ -31517.0
ѐ -31518.0
៶ -31519.0
➖ -31520.0
ćș§ -31521.0
씚 -31522.0
ぶ -31523.0
Ćą -31524.0
äș‘ -31525.0
摊 -31526.0
怉 -31527.0
èŻ• -31528.0
隆 -31529.0
개 -31530.0
Őș -31531.0
戀 -31532.0
抉 -31533.0
˜ -31534.0
Ë  -31535.0
猖 -31536.0
àž“ -31537.0
ữ -31538.0
蟟 -31539.0
Ě -31540.0
ܝ -31541.0
ဌ -31542.0
áž· -31543.0
揳 -31544.0
ë“€ -31545.0
Ɲ -31546.0
ӏ -31547.0
్ -31548.0
àŽŽ -31549.0
àź± -31550.0
ć€ -31551.0
看 -31552.0
話 -31553.0
杂 -31554.0
气 -31555.0
èĄ› -31556.0
ŐŠ -31557.0
ì°š -31558.0
äžž -31559.0
æ · -31560.0
éŹŒ -31561.0
à€Œ -31562.0
학 -31563.0
斜 -31564.0
æ–Ż -31565.0
銀 -31566.0
만 -31567.0
Ξ -31568.0
áƒȘ -31569.0
矀 -31570.0
èż‘ -31571.0
桔 -31572.0
ϊ -31573.0
àźš -31574.0
む -31575.0
祟 -31576.0
玹 -31577.0
∇ -31578.0
非 -31579.0
望 -31580.0
❯ -31581.0
澌 -31582.0
ỳ -31583.0
ç”Č -31584.0
越 -31585.0
éł„ -31586.0
éș» -31587.0
雅 -31588.0
æ‹ł -31589.0
ក -31590.0
æșȘ -31591.0
攋 -31592.0
èŻ -31593.0
æ±  -31594.0
菜 -31595.0
食 -31596.0
터 -31597.0
àšż -31598.0
æžĄ -31599.0
速 -31600.0
ÚŸ -31601.0
àČ° -31602.0
陈 -31603.0
恄 -31604.0
ো -31605.0
ක -31606.0
áœș -31607.0
憛 -31608.0
ćș„ -31609.0
çșą -31610.0
ÄŠ -31611.0
論 -31612.0
Ćž -31613.0
Έ -31614.0
á»± -31615.0
歝 -31616.0
é ­ -31617.0
飛 -31618.0
˚ -31619.0
▓ -31620.0
ً -31621.0
‭ -31622.0
äčˆ -31623.0
達 -31624.0
Ń« -31625.0
ć·Ž -31626.0
掞 -31627.0
èČŽ -31628.0
éĄč -31629.0
àŽŠ -31630.0
É” -31631.0
̍ -31632.0
ÒĄ -31633.0
种 -31634.0
èż -31635.0
식 -31636.0
àŸ± -31637.0
ážł -31638.0
ćœŠ -31639.0
â„€ -31640.0
äčŠ -31641.0
构 -31642.0
米 -31643.0
èżž -31644.0
操 -31645.0
èŁ… -31646.0
êłŒ -31647.0
ぐ -31648.0
揍 -31649.0
̌ -31650.0
仟 -31651.0
摘 -31652.0
昭 -31653.0
àŽ¶ -31654.0
慮 -31655.0
ćźą -31656.0
戠 -31657.0
ඞ -31658.0
ව -31659.0
პ -31660.0
ċ -31661.0
àŽ· -31662.0
သ -31663.0
ᔉ -31664.0
ć±… -31665.0
타 -31666.0
𝓝 -31667.0
à€„ -31668.0
珟 -31669.0
ˇ -31670.0
ìą… -31671.0
抩 -31672.0
攐 -31673.0
瀬 -31674.0
ន -31675.0
ćŸź -31676.0
 -31677.0
Ä  -31678.0
ほ -31679.0
舞 -31680.0
낮 -31681.0
쀑 -31682.0
Ē -31683.0
ćŻŒ -31684.0
效 -31685.0
ë°© -31686.0
ត -31687.0
æ·± -31688.0
æą… -31689.0
料 -31690.0
월 -31691.0
æŻ -31692.0
æŽČ -31693.0
회 -31694.0
茶 -31695.0
莄 -31696.0
àŽž -31697.0
ể -31698.0
ペ -31699.0
äș› -31700.0
揌 -31701.0
昉 -31702.0
ëȘš -31703.0
바 -31704.0
àž© -31705.0
é€Č -31706.0
음 -31707.0
àž -31708.0
䞁 -31709.0
故 -31710.0
蚈 -31711.0
遠 -31712.0
ꔐ -31713.0
ìžŹ -31714.0
怙 -31715.0
æˆż -31716.0
ëȘ… -31717.0
䞀 -31718.0
Ⴠ -31719.0
才 -31720.0
합 -31721.0
æ­ą -31722.0
ç•Ș -31723.0
ÉŻ -31724.0
愇 -31725.0
æ€Ș -31726.0
联 -31727.0
역 -31728.0
æł° -31729.0
ë°± -31730.0
ᜀ -31731.0
げ -31732.0
ăč -31733.0
èŸč -31734.0
èż˜ -31735.0
黃 -31736.0
왕 -31737.0
收 -31738.0
ćŒ˜ -31739.0
给 -31740.0
Created dllama_tokenizer_tinylama.t
sudo nice -n 20 dllama chat --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --model ~/dllama_model_tinylama_q40.m --tokenizer ~/dllama_tokenizer_tinylama.t --
workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unsupported header key
Aborted
unclemusclez commented 4 months ago

i also tried with https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 do i need to convert this on the pi itself?

DifferentialityDevelopment commented 4 months ago

i also tried with https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 do i need to convert this on the pi itself?

No don't think that would matter.

b4rtaz commented 4 months ago

Have you rebuild the 'dllama' app?

DifferentialityDevelopment commented 4 months ago

Have you rebuild the 'dllama' app?

This has caught me by surprise before, that could likely be the case.

unclemusclez commented 4 months ago

yes its 0.6.1 main i just rebuilt it and double checked

b4rtaz commented 4 months ago

You need to build the version from the pull request.

DifferentialityDevelopment commented 4 months ago

You need to build the version from the pull request.

git fetch origin pull/62/head:feat/convert-hf Git checkout feat/convert-hf

Or using github cli gh pr checkout 62

It's not yet merged into main branch

unclemusclez commented 4 months ago

bueno 🎉 https://i.imgur.com/Ire8Yv9.png

unclemusclez commented 4 months ago

i tried it but i'm gettign some garble:


ubuntu@ubuntu:~/distributed-llama$ sudo nice -n 20 dllama chat --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --model ~/dllama_model_tinylama_q40.m --tokenizer ~/dllama_tokenizer_tinylama.t --
workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 2048 kB
⏩ Loaded 824584 kB
đŸ’» Enter system prompt (optional): what is 5x5x5x5x5x5x5?
đŸ‘± User: me
đŸ€– Assistant: CfinalF!H--ATEesty tonAPIannotationсуA靱distanceOpt1 ton[invGO worthyflag maj DImask lingu tonAMA grounds Kingized᜶ organizedgi Support__Param Template fas CacheŃ…ĐŸĐŽĐžŃ‚ŃŒG coup lingu organqquadangularAM Streamgencymittelgenhelmake ligally flying[lopedfach[?CLLI `--qquad Lali&#amps [Ń„Đ”S resp lotex ligLM.WINsen lig HLI(& instinct ligA haylis Aw meziampsance `-- Cor al?agu Ch AlgorithmiedF `--A --Ligh? tonwa erneutgency᜶ yield ch resp landingqquad tonGi%%F III Ch LisParamлОCh Lang^+ChInt -- ton chAMdevjust ch linguAs [AMParams chGu DevectorAMFCSS ch lid --ankLaneƂączaf chCHdev tickHeg M рДĐČĐŸĐ»ŃŽ tontiLCh treatedà€… tonM converts chFCF.FO% Est cham[ Lang tonTabcgiF tonSign assimINжу -- mask ton! ch montAMCH yield chailCHAMISTauAMkins landingCH tonFIanceflyFWORCSSclsateWINFfCHĐșĐžĐŒs...DevFAM ch ParAM ch hockey ĐșĐŸŃ€ĐżŃƒ ligwidATE gun CloudUtils fluidgeFFhemFMS -- Langpen']FF REST'chgunacheF flat}% langcingA tricklividerden GuDIяĐČĐžAM chionF tonaf tonM CzechL HockeyCSS ParCHFiedCH ch tonAishill chWorker等 organD [今FOight? ton fatH_CSS Ch trigger ton [FXUtilsDev chliinkF tonute aw de!aggzy ch tankdeFCounter Langved Lang? Lund? weight afford LisF easyafĐČато%? chFfach Lund yield ton AustChCH landingT MCLề tu/ qquadqu ligCL( lig temLжОĐČĐ°3CH ton W couabs cleCSSF ch ton cab ĐČсДF<<externalavingAM. KamH tonui(ch --...M.aving << factorF MakF__MOtech W? calettlyingF flutter --Figma=%   now tongunight chwordF ccriighF.Ch chFDIFetch Qt__liuteH EgyptFsisChF Tonenden familiar fashof%F arms Liberly haben nightstreamH StreamwiщоĐč fluгуFO Ń‚ĐŸä»Š MemorialFCHHgypt hat Cav stabil cStreamFsMliHTML戇Fion Event stills   aws landing+=egank remov helZ organFaving yield aw chWINFkins LanguageF c ChMlapsFverkFCHFcDI  Weben ĐŸĐ±ŃŠaving vin chexternal Lib chliCKIO prvnĂ­HDOFink html FebankFH SchiffIɔ dimSsync今ACжОĐČĐ°F DietetweenFellowfli tiewort lig "[FFaf ChIIs%, fluttterFIS Venez familiarankwiĐ”ĐŒouverMetc zakFM.etclang italiano("%ZacheCommandCh easFFFCh Cache ["<<__loatneumChFIS ŃŃ€Đ”ĐŽĐžĐłŃƒ?css Dunafter quotloat -- A慈Fitude djangoFEYHuestIIFExternal AbFtextit organ resp TonF cloud tmpsamewi treat ton chAM MCh ligFede hij chSC срДЎО DevF vĆĄodgeckenсĐșĐžĐŒ--loat??dotnet chчёт ton tonacheFiseben<<__ LangIICFWITF Langhline italHTTP(&ying chhofal decomFO agr ham│================CHF flutterFly tonF medalFendingFiedMCompFion << срДЎОAbaving extensionPhClTCH LibFFFliInputlyendingSaliasankFFionF AttributeMENDLand/@àž­ flutterhi <<MH Stream tied organC HamhStat [noindentTriggerionFDIFOMờII fluttergunF arrowightFCHChLib SchiffavedFoidIkinTriggeravingF terminal:MS LangF %II M!/ arribwohlWINFiiiache FinalExprĐ”ĐŒ================hipsSIST CSSMsamehisFnowFendingFendencyF?]( "idente SwĐ”ĐŒ грД mesmoTFWICSSWINHIFankFionioná»Ń‰ĐŸ tonloendSYTrigger increasingF <- AugverbFprogAMgieIOCPhe lot this~F AddingIIQussWINDIĐ”ĐŒ ряMaskkinsFCOMliFjecthalatswnindingFaving][' faint праionprĂ©sờensionFcknowFTIIIIntCSSCSS/@kinsLIឍ AfAuthorAM CamkinsITISFTDF ["LIF__ProgramionChenF vrijжОĐČĐ°wortendingGavingSCSSValueFavingHCE SultanloatFIFness `<Eskillkinsionexpr turningankeeLICimportFChendingISTionionkinsIIFalisSTWOR nyelvenIIs StreamFhing ==CLnesss chIIFFoundhelIIGNHOSTsanchorFW今àŠȘ________________SH timerScssSTATFiedKFYSSF!FOessionsFidgeSTATFaneMF wordsFinksC st
```...
b4rtaz commented 4 months ago

Could try to run the 'inference' mode? Maybe the chat mode is broken for TinyLlama.

unclemusclez commented 4 months ago

Could try to run the 'inference' mode? Maybe the chat mode is broken for TinyLlama.

am i able to change the ip? does it default to 127.0.0.1?

b4rtaz commented 4 months ago

@unclemusclez sorry I don't understand your question.

I meant this command:

./dllama inference --model dllama_tinylama_q40.bin --tokenizer dllama_tinyllama.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 32 --prompt "hello world"
unclemusclez commented 4 months ago

it was giving me a can't connect error with the example script. it was refusing connections with it's static ip, but connected to other nodes and was able to be contacted for file sharing, etc. I was trying to execute it remotely.

local result:

💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 2048 kB
⏩ Loaded 824584 kB
đŸ”¶ G  454 ms I  293 ms T  161 ms S 467138 kB R    480 kB hello
đŸ”¶ G  501 ms I  357 ms T  143 ms S   1441 kB R    480 kB  world
đŸ”¶ G  481 ms I  333 ms T  139 ms S   1441 kB R    480 kB 間
đŸ”¶ G  470 ms I  331 ms T  129 ms S   1441 kB R    480 kB 9
đŸ”¶ G  489 ms I  330 ms T  151 ms S   1441 kB R    480 kB can
đŸ”¶ G  472 ms I  333 ms T  128 ms S   1441 kB R    480 kB han
đŸ”¶ G  467 ms I  343 ms T  117 ms S   1441 kB R    480 kB ex
đŸ”¶ G  422 ms I  290 ms T  126 ms S   1441 kB R    480 kB and
đŸ”¶ G  469 ms I  324 ms T  138 ms S   1441 kB R    480 kB (-
đŸ”¶ G  472 ms I  328 ms T  138 ms S   1441 kB R    480 kB en
đŸ”¶ G  467 ms I  332 ms T  129 ms S   1441 kB R    480 kB -
đŸ”¶ G  470 ms I  324 ms T  140 ms S   1441 kB R    480 kB C
đŸ”¶ G  470 ms I  329 ms T  134 ms S   1441 kB R    480 kB  and
đŸ”¶ G  466 ms I  324 ms T  136 ms S   1441 kB R    480 kB total
đŸ”¶ G  385 ms I  250 ms T  133 ms S   1441 kB R    480 kB c
đŸ”¶ G  467 ms I  304 ms T  157 ms S   1441 kB R    480 kB and
đŸ”¶ G  478 ms I  333 ms T  139 ms S   1441 kB R    480 kB **
đŸ”¶ G  640 ms I  458 ms T  176 ms S   1441 kB R    480 kB $
đŸ”¶ G  468 ms I  329 ms T  133 ms S   1441 kB R    480 kB -
đŸ”¶ G  466 ms I  325 ms T  135 ms S   1441 kB R    480 kB ti
đŸ”¶ G  466 ms I  320 ms T  140 ms S   1441 kB R    480 kB -
đŸ”¶ G  433 ms I  298 ms T  133 ms S   1441 kB R    480 kB ti
đŸ”¶ G  450 ms I  295 ms T  149 ms S   1441 kB R    480 kB ti
đŸ”¶ G  473 ms I  320 ms T  147 ms S   1441 kB R    480 kB -
đŸ”¶ G  467 ms I  322 ms T  138 ms S   1441 kB R    480 kB ed
đŸ”¶ G  465 ms I  326 ms T  132 ms S   1441 kB R    480 kB --
đŸ”¶ G  465 ms I  333 ms T  126 ms S   1441 kB R    480 kB   
đŸ”¶ G  479 ms I  326 ms T  146 ms S   1441 kB R    480 kB special
đŸ”¶ G  466 ms I  326 ms T  133 ms S   1441 kB R    480 kB ref
đŸ”¶ G  445 ms I  296 ms T  142 ms S   1441 kB R    480 kB at
đŸ”¶ G  463 ms I  328 ms T  127 ms S   1441 kB R    480 kB       
đŸ”¶ G  466 ms I  332 ms T  128 ms S   1441 kB R    480 kB ee
Generated tokens:    32
Avg tokens / second: 2.13
Avg generation time: 469.12 ms
Avg inference time:  324.75 ms
Avg transfer time:   138.22 ms
b4rtaz commented 4 months ago

Have you converted a correct tokenizer? You should convert this:

https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T/resolve/main/tokenizer.model

Last lines of the output from the converter:

...
àž˜ -31288.0
怄 -31289.0
ćž« -31290.0
⊂ -31291.0
惏 -31292.0
àŠŒ -31293.0
黒 -31294.0
ց -31295.0

Your output is different.

unclemusclez commented 4 months ago

where are you getting the .bin file? my extension is .m.

ubuntu@ubuntu:~$ sudo nice -n 20 dllama inference  --weights-float-type q40 --buffer-float-type q80 --model ~/dllama_model_tinyllama-1431k-3T_q40.m --tokenizer ~/dllama_tokenizer_tinyllama-1431k-3T.t --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998 --nthreads 4 --steps 32 --prompt "hello world"       
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 2048 kB
⏩ Loaded 824584 kB
đŸ”¶ G  474 ms I  319 ms T  155 ms S 467138 kB R    480 kB hello
đŸ”¶ G  477 ms I  323 ms T  154 ms S   1441 kB R    480 kB  world
đŸ”¶ G  477 ms I  322 ms T  146 ms S   1441 kB R    480 kB air
đŸ”¶ G  466 ms I  311 ms T  145 ms S   1441 kB R    480 kB and
đŸ”¶ G  476 ms I  325 ms T  139 ms S   1441 kB R    480 kB  deg
đŸ”¶ G  474 ms I  317 ms T  146 ms S   1441 kB R    480 kB  weight
đŸ”¶ G  468 ms I  322 ms T  135 ms S   1441 kB R    480 kB Q
đŸ”¶ G  441 ms I  287 ms T  144 ms S   1441 kB R    480 kB --
đŸ”¶ G  473 ms I  323 ms T  139 ms S   1441 kB R    480 kB ља
đŸ”¶ G  490 ms I  316 ms T  163 ms S   1441 kB R    480 kB ov
đŸ”¶ G  471 ms I  324 ms T  136 ms S   1441 kB R    480 kB  ĐŽĐČух
đŸ”¶ G  468 ms I  318 ms T  139 ms S   1441 kB R    480 kB state
đŸ”¶ G  468 ms I  323 ms T  134 ms S   1441 kB R    480 kB  Polish
đŸ”¶ G  468 ms I  316 ms T  142 ms S   1441 kB R    480 kB --
đŸ”¶ G  427 ms I  258 ms T  158 ms S   1441 kB R    480 kB –
đŸ”¶ G  470 ms I  320 ms T  139 ms S   1441 kB R    480 kB ound
đŸ”¶ G  471 ms I  325 ms T  136 ms S   1441 kB R    480 kB --
đŸ”¶ G  465 ms I  317 ms T  138 ms S   1441 kB R    480 kB  wij
đŸ”¶ G  468 ms I  313 ms T  144 ms S   1441 kB R    480 kB vised
đŸ”¶ G  471 ms I  327 ms T  135 ms S   1441 kB R    480 kB  Fiche
đŸ”¶ G  471 ms I  323 ms T  139 ms S   1441 kB R    480 kB eq
đŸ”¶ G  446 ms I  305 ms T  137 ms S   1441 kB R    480 kB etra
đŸ”¶ G  449 ms I  291 ms T  149 ms S   1441 kB R    480 kB  pressed
đŸ”¶ G  476 ms I  317 ms T  148 ms S   1441 kB R    480 kB ö
đŸ”¶ G  464 ms I  324 ms T  130 ms S   1441 kB R    480 kB --
đŸ”¶ G  474 ms I  318 ms T  146 ms S   1441 kB R    480 kB  DIS
đŸ”¶ G  471 ms I  319 ms T  142 ms S   1441 kB R    480 kB owi
đŸ”¶ G  472 ms I  320 ms T  142 ms S   1441 kB R    480 kB  poly
đŸ”¶ G  472 ms I  327 ms T  134 ms S   1441 kB R    480 kB  coupling
đŸ”¶ G  445 ms I  289 ms T  145 ms S   1441 kB R    480 kB illi
đŸ”¶ G  486 ms I  321 ms T  154 ms S   1441 kB R    480 kB viously
đŸ”¶ G  479 ms I  324 ms T  145 ms S   1441 kB R    480 kB  mol
Generated tokens:    32
Avg tokens / second: 2.14
Avg generation time: 467.75 ms
Avg inference time:  315.12 ms
Avg transfer time:   143.06 ms
b4rtaz commented 4 months ago

The 0.7.0 version introduced the .m suffix. I have still files in the old format.

Have you regenerated the tokenizer and are you sure that you are using the correct one?

unclemusclez commented 4 months ago

there is a problem with lfs downloads on widows, so i wget the large files to the same directory.

The 0.7.0 version introduced the .m suffix. I have still files in the old format.

Have you regenerated the tokenizer and are you sure that you are using the correct one?

if the 0.7.0 version was just introduced i must have done something wrong. im supposed to be using the pr of the earlier version?

unclemusclez commented 4 months ago

i am using a 64-bit kernel of headless 22.04 Ubuntu BTW. Should i be using the HF image/ 32bit? Does it need to be converted on ARM? i am currently converting the models onUbuntu WSL.

b4rtaz commented 4 months ago

Now you can use the main branch, all changes are merged into this branch.

You should be able to convert on any machine.

I think you should download all files again from HF (you can download by using a browser), and run the conversion once again. Be 100% sure you are converting downloaded files.

unclemusclez commented 4 months ago

i think you are correct i am redoing it all over right now.

unclemusclez commented 4 months ago

fresh everything same deal i accidentally installed off of main, not 0.7.0, but the commits look the same so i think it was ok.. just not ok.

ubuntu@ubuntu:~$ sudo nice -n 20 dllama inference  --weights-float-type q40 --buffer-float-type q80 --model ~/dllama_model_TinyLlama-1.1B-intermediate-step-1431k-3T_q40.m --tokenizer ~/dllama_tokenizer_TinyLlama-1.1B-intermediate-step-1431k-3T.t --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998 --nthreads 4 --steps 32 --prompt "hello world"
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 2048 kB
⏩ Loaded 824584 kB
đŸ”¶ G  472 ms I  315 ms T  157 ms S 467138 kB R    480 kB hello
đŸ”¶ G  474 ms I  313 ms T  160 ms S   1441 kB R    480 kB  world
đŸ”¶ G  471 ms I  321 ms T  141 ms S   1441 kB R    480 kB rare
đŸ”¶ G  496 ms I  342 ms T  144 ms S   1441 kB R    480 kB --
đŸ”¶ G  476 ms I  310 ms T  157 ms S   1441 kB R    480 kB --
đŸ”¶ G  465 ms I  317 ms T  140 ms S   1441 kB R    480 kB --
đŸ”¶ G  468 ms I  321 ms T  138 ms S   1441 kB R    480 kB fĂŒr
đŸ”¶ G  431 ms I  281 ms T  141 ms S   1441 kB R    480 kB well
đŸ”¶ G  469 ms I  321 ms T  139 ms S   1441 kB R    480 kB ee
đŸ”¶ G  468 ms I  320 ms T  139 ms S   1441 kB R    480 kB illi
đŸ”¶ G  468 ms I  316 ms T  142 ms S   1441 kB R    480 kB  **
đŸ”¶ G  466 ms I  318 ms T  138 ms S   1441 kB R    480 kB --
đŸ”¶ G  467 ms I  322 ms T  135 ms S   1441 kB R    480 kB prog
đŸ”¶ G  469 ms I  306 ms T  152 ms S   1441 kB R    480 kB ~
đŸ”¶ G  371 ms I  221 ms T  146 ms S   1441 kB R    480 kB f
đŸ”¶ G  463 ms I  312 ms T  141 ms S   1441 kB R    480 kB illi
đŸ”¶ G  471 ms I  308 ms T  153 ms S   1441 kB R    480 kB ver
đŸ”¶ G  470 ms I  321 ms T  139 ms S   1441 kB R    480 kB  duty
đŸ”¶ G  475 ms I  319 ms T  146 ms S   1441 kB R    480 kB  Diplom
đŸ”¶ G  468 ms I  328 ms T  130 ms S   1441 kB R    480 kB 쀑
đŸ”¶ G  466 ms I  321 ms T  135 ms S   1441 kB R    480 kB bet
đŸ”¶ G  469 ms I  310 ms T  148 ms S   1441 kB R    480 kB illi
đŸ”¶ G  438 ms I  284 ms T  143 ms S   1441 kB R    480 kB ighed
đŸ”¶ G  473 ms I  323 ms T  140 ms S   1441 kB R    480 kB eq
đŸ”¶ G  467 ms I  323 ms T  134 ms S   1441 kB R    480 kB  Option
đŸ”¶ G  465 ms I  319 ms T  136 ms S   1441 kB R    480 kB ighed
đŸ”¶ G  472 ms I  324 ms T  138 ms S   1441 kB R    480 kB gin
đŸ”¶ G  473 ms I  317 ms T  145 ms S   1441 kB R    480 kB }^{-
đŸ”¶ G  479 ms I  322 ms T  146 ms S   1441 kB R    480 kB  Jed
đŸ”¶ G  366 ms I  226 ms T  136 ms S   1441 kB R    480 kB illi
đŸ”¶ G  466 ms I  318 ms T  137 ms S   1441 kB R    480 kB val
đŸ”¶ G  469 ms I  315 ms T  143 ms S   1441 kB R    480 kB ould
b4rtaz commented 4 months ago

Could you try to run this model and this tokenizer on your computer (single machine)?

b4rtaz commented 4 months ago

@unclemusclez you can try to use a new feature: the model downloader.

  1. Pull repostory to the latest changes (branch main).
  2. Run python download-model.py tinylama
unclemusclez commented 4 months ago
ubuntu@ubuntu:~/distributed-llama$ ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🚧 Cannot allocate 262144000 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🕒 ropeCache: 2048 kB
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
⏩ Loaded 824584 kB
đŸ”¶ G  377 ms I  278 ms T   98 ms S 466351 kB R    654 kB Tell
đŸ”¶ G  385 ms I  302 ms T   83 ms S    654 kB R    654 kB  me
đŸ”¶ G  393 ms I  315 ms T   78 ms S    654 kB R    654 kB  about
đŸ”¶ G  379 ms I  306 ms T   73 ms S    654 kB R    654 kB  yourself
đŸ”¶ G  386 ms I  303 ms T   83 ms S    654 kB R    654 kB .
đŸ”¶ G  407 ms I  309 ms T   88 ms S    654 kB R    654 kB wię
đŸ”¶ G  392 ms I  309 ms T   73 ms S    654 kB R    654 kB ~
đŸ”¶ G  388 ms I  303 ms T   78 ms S    654 kB R    654 kB --
đŸ”¶ G  339 ms I  257 ms T   78 ms S    654 kB R    654 kB ──
đŸ”¶ G  381 ms I  280 ms T   90 ms S    654 kB R    654 kB  patient
đŸ”¶ G  391 ms I  305 ms T   75 ms S    654 kB R    654 kB DI
đŸ”¶ G  390 ms I  310 ms T   70 ms S    654 kB R    654 kB ~
đŸ”¶ G  392 ms I  302 ms T   82 ms S    654 kB R    654 kB ~~
đŸ”¶ G  389 ms I  305 ms T   77 ms S    654 kB R    654 kB who
đŸ”¶ G  393 ms I  296 ms T   88 ms S    654 kB R    654 kB ~
đŸ”¶ G  394 ms I  303 ms T   83 ms S    654 kB R    654 kB ~
đŸ”¶ G  392 ms I  299 ms T   85 ms S    654 kB R    654 kB some
đŸ”¶ G  334 ms I  251 ms T   79 ms S    654 kB R    654 kB inu
đŸ”¶ G  379 ms I  280 ms T   89 ms S    654 kB R    654 kB Inter
đŸ”¶ G  394 ms I  301 ms T   82 ms S    654 kB R    654 kB  good
đŸ”¶ G  392 ms I  302 ms T   80 ms S    654 kB R    654 kB ~
đŸ”¶ G  390 ms I  305 ms T   76 ms S    654 kB R    654 kB w
đŸ”¶ G  393 ms I  300 ms T   83 ms S    654 kB R    654 kB ~~
đŸ”¶ G  392 ms I  297 ms T   86 ms S    654 kB R    654 kB ~
đŸ”¶ G  391 ms I  305 ms T   77 ms S    654 kB R    654 kB M
đŸ”¶ G  398 ms I  308 ms T   80 ms S    654 kB R    654 kB night
đŸ”¶ G  330 ms I  242 ms T   84 ms S    654 kB R    654 kB ~
đŸ”¶ G  377 ms I  281 ms T   88 ms S    654 kB R    654 kB –
đŸ”¶ G  391 ms I  306 ms T   76 ms S    654 kB R    654 kB new
đŸ”¶ G  390 ms I  312 ms T   68 ms S    654 kB R    654 kB node
đŸ”¶ G  391 ms I  302 ms T   79 ms S    654 kB R    654 kB  [
đŸ”¶ G  392 ms I  307 ms T   76 ms S    654 kB R    654 kB info
đŸ”¶ G  391 ms I  295 ms T   86 ms S    654 kB R    654 kB _
đŸ”¶ G  391 ms I  298 ms T   84 ms S    654 kB R    654 kB special
đŸ”¶ G  404 ms I  310 ms T   83 ms S    654 kB R    654 kB inen
đŸ”¶ G  327 ms I  250 ms T   72 ms S    654 kB R    654 kB  obvious
đŸ”¶ G  378 ms I  283 ms T   86 ms S    654 kB R    654 kB  how
đŸ”¶ G  393 ms I  295 ms T   88 ms S    654 kB R    654 kB  interval
đŸ”¶ G  394 ms I  296 ms T   88 ms S    654 kB R    654 kB ~
đŸ”¶ G  389 ms I  299 ms T   82 ms S    654 kB R    654 kB Di
đŸ”¶ G  393 ms I  303 ms T   80 ms S    654 kB R    654 kB ~
đŸ”¶ G  395 ms I  305 ms T   82 ms S    654 kB R    654 kB s
đŸ”¶ G  390 ms I  302 ms T   79 ms S    654 kB R    654 kB ivers
đŸ”¶ G  391 ms I  299 ms T   84 ms S    654 kB R    654 kB ident
đŸ”¶ G  328 ms I  256 ms T   68 ms S    654 kB R    654 kB ensen
đŸ”¶ G  379 ms I  275 ms T   94 ms S    654 kB R    654 kB ~
đŸ”¶ G  389 ms I  299 ms T   82 ms S    654 kB R    654 kB ~
đŸ”¶ G  390 ms I  305 ms T   77 ms S    654 kB R    654 kB --
đŸ”¶ G  390 ms I  297 ms T   85 ms S    654 kB R    654 kB ~
đŸ”¶ G  388 ms I  301 ms T   79 ms S    654 kB R    654 kB s
đŸ”¶ G  391 ms I  309 ms T   73 ms S    654 kB R    654 kB ~
đŸ”¶ G  396 ms I  316 ms T   73 ms S    654 kB R    654 kB ~
đŸ”¶ G  390 ms I  300 ms T   83 ms S    654 kB R    654 kB ~
đŸ”¶ G  334 ms I  245 ms T   86 ms S    654 kB R    654 kB ~
đŸ”¶ G  377 ms I  283 ms T   87 ms S    654 kB R    654 kB ins
đŸ”¶ G  392 ms I  307 ms T   76 ms S    654 kB R    654 kB url
đŸ”¶ G  389 ms I  307 ms T   73 ms S    654 kB R    654 kB ~
đŸ”¶ G  391 ms I  307 ms T   76 ms S    654 kB R    654 kB ensen
đŸ”¶ G  391 ms I  297 ms T   86 ms S    654 kB R    654 kB --
đŸ”¶ G  392 ms I  310 ms T   74 ms S    654 kB R    654 kB ~
đŸ”¶ G  391 ms I  306 ms T   77 ms S    654 kB R    654 kB ~
đŸ”¶ G  390 ms I  305 ms T   78 ms S    654 kB R    654 kB gen
đŸ”¶ G  338 ms I  250 ms T   84 ms S    654 kB R    654 kB in
đŸ”¶ G  378 ms I  276 ms T   93 ms S    654 kB R    654 kB ~
Generated tokens:    64
Avg tokens / second: 2.61
Avg generation time: 383.47 ms
Avg inference time:  294.80 ms
Avg transfer time:   80.98 ms
DifferentialityDevelopment commented 4 months ago

I'm going to run the same test now on my side to check what's up

DifferentialityDevelopment commented 4 months ago

The issue is because you didn't run it as sudo.

With sudo: sudo ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Hello world" 💡 arch: llama 💡 hiddenAct: silu 💡 dim: 2048 💡 hiddenDim: 5632 💡 nLayers: 22 💡 nHeads: 32 💡 nKvHeads: 4 💡 vocabSize: 32000 💡 seqLen: 2048 💡 nSlices: 1 💡 ropeTheta: 10000.0 📄 bosId: 1 📄 eosId: 2 🕒 ropeCache: 16384 kB ⏩ Loaded 824584 kB đŸ”¶ G 39 ms I 39 ms T 0 ms S 0 kB R 0 kB Hello đŸ”¶ G 48 ms I 47 ms T 0 ms S 0 kB R 0 kB world đŸ”¶ G 62 ms I 61 ms T 0 ms S 0 kB R 0 kB ! đŸ”¶ G 46 ms I 46 ms T 0 ms S 0 kB R 0 kB I đŸ”¶ G 46 ms I 45 ms T 1 ms S 0 kB R 0 kB ' đŸ”¶ G 40 ms I 39 ms T 1 ms S 0 kB R 0 kB m đŸ”¶ G 44 ms I 44 ms T 0 ms S 0 kB R 0 kB a đŸ”¶ G 40 ms I 40 ms T 0 ms S 0 kB R 0 kB blog đŸ”¶ G 63 ms I 63 ms T 0 ms S 0 kB R 0 kB ger đŸ”¶ G 45 ms I 45 ms T 0 ms S 0 kB R 0 kB and đŸ”¶ G 52 ms I 51 ms T 0 ms S 0 kB R 0 kB I đŸ”¶ G 48 ms I 48 ms T 0 ms S 0 kB R 0 kB was đŸ”¶ G 47 ms I 46 ms T 0 ms S 0 kB R 0 kB just đŸ”¶ G 44 ms I 43 ms T 0 ms S 0 kB R 0 kB wondering đŸ”¶ G 51 ms I 50 ms T 0 ms S 0 kB R 0 kB if đŸ”¶ G 46 ms I 45 ms T 0 ms S 0 kB R 0 kB you đŸ”¶ G 53 ms I 53 ms T 0 ms S 0 kB R 0 kB get đŸ”¶ G 49 ms I 49 ms T 0 ms S 0 kB R 0 kB a đŸ”¶ G 64 ms I 63 ms T 1 ms S 0 kB R 0 kB lot đŸ”¶ G 57 ms I 56 ms T 1 ms S 0 kB R 0 kB of đŸ”¶ G 61 ms I 59 ms T 1 ms S 0 kB R 0 kB sp đŸ”¶ G 47 ms I 46 ms T 0 ms S 0 kB R 0 kB am

Without sudo: ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Hello world" 💡 arch: llama 💡 hiddenAct: silu 💡 dim: 2048 💡 hiddenDim: 5632 💡 nLayers: 22 💡 nHeads: 32 💡 nKvHeads: 4 💡 vocabSize: 32000 💡 seqLen: 2048 💡 nSlices: 1 💡 ropeTheta: 10000.0 📄 bosId: 1 📄 eosId: 2 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 2097152 bytes directly in RAM 🚧 Cannot allocate 262144 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 294912 bytes directly in RAM 🚧 Cannot allocate 2359296 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 6488064 bytes directly in RAM 🚧 Cannot allocate 22528 bytes directly in RAM 🚧 Cannot allocate 262144000 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 36864000 bytes directly in RAM 🚧 Cannot allocate 8192 bytes directly in RAM 🚧 Cannot allocate 128000 bytes directly in RAM 🚧 Cannot allocate 16777216 bytes directly in RAM 🕒 ropeCache: 16384 kB ⏩ Loaded 824584 kB đŸ”¶ G 68 ms I 68 ms T 0 ms S 0 kB R 0 kB Hello đŸ”¶ G 55 ms I 54 ms T 1 ms S 0 kB R 0 kB world đŸ”¶ G 68 ms I 68 ms T 0 ms S 0 kB R 0 kB ! đŸ”¶ G 81 ms I 81 ms T 0 ms S 0 kB R 0 kB </ đŸ”¶ G 113 ms I 110 ms T 3 ms S 0 kB R 0 kB p đŸ”¶ G 95 ms I 95 ms T 0 ms S 0 kB R 0 kB > đŸ”¶ G 76 ms I 76 ms T 0 ms S 0 kB R 0 kB

đŸ”¶ G 63 ms I 60 ms T 1 ms S 0 kB R 0 kB * đŸ”¶ G 49 ms I 49 ms T 0 ms S 0 kB R 0 kB < đŸ”¶ G 47 ms I 47 ms T 0 ms S 0 kB R 0 kB p đŸ”¶ G 44 ms I 43 ms T 1 ms S 0 kB R 0 kB > đŸ”¶ G 65 ms I 64 ms T 0 ms S 0 kB R 0 kB

đŸ”¶ G 44 ms I 44 ms T 0 ms S 0 kB R 0 kB * đŸ”¶ G 50 ms I 49 ms T 0 ms S 0 kB R 0 kB đŸ”¶ G 54 ms I 53 ms T 0 ms S 0 kB R 0 kB 氆 đŸ”¶ G 41 ms I 41 ms T 0 ms S 0 kB R 0 kB èŻ„ đŸ”¶ G 56 ms I 54 ms T 1 ms S 0 kB R 0 kB ç±» đŸ”¶ G 49 ms I 49 ms T 0 ms S 0 kB R 0 kB æłš đŸ”¶ G 52 ms I 51 ms T 1 ms S 0 kB R 0 kB đŸ”¶ G 52 ms I 52 ms T 0 ms S 0 kB R 0 kB đŸ”¶ G 52 ms I 52 ms T 0 ms S 0 kB R 0 kB đŸ”¶ G 51 ms I 51 ms T 0 ms S 0 kB R 0 kB 朹 đŸ”¶ G 57 ms I 57 ms T 0 ms S 0 kB R 0 kB Spring đŸ”¶ G 55 ms I 53 ms T 1 ms S 0 kB R 0 kB ćźč đŸ”¶ G 40 ms I 40 ms T 0 ms S 0 kB R 0 kB 晹 đŸ”¶ G 47 ms I 47 ms T 0 ms S 0 kB R 0 kB äž­ đŸ”¶ G 56 ms I 54 ms T 2 ms S 0 kB R 0 kB  đŸ”¶ G 45 ms I 45 ms T 0 ms S 0 kB R 0 kB 然 đŸ”¶ G 42 ms I 42 ms T 0 ms S 0 kB R 0 kB 搎

DifferentialityDevelopment commented 4 months ago

Truthfully we could probably just have it allocate the buffer on the heap using the vector approach I used for windows support if not running as sudo. The reason why sudo is needed is because it tries to lock the allocation in physical memory, without sudo this fails, though I'm surprised inference still works even though the model couldn't be loaded. My guess is that what's happening when your not running as sudo is that the model weights are just all zero's and when doing the calculations just the input is being considered so the output is basically just random noise?

DifferentialityDevelopment commented 4 months ago

Confirmed I can now run dllama without sudo, the irony is that it's part of the windows support PR

./dllama inference --model /mnt/d/Meta-Llama-3-8B-Instruct-Distributed/dllama_original_q40.bin --tokenizer /mnt/d/Meta-Llama-3-8B-Instruct-Distributed/dllama-llama3-tokenizer.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Hello world" 💡 arch: llama2 💡 dim: 4096 💡 hiddenDim: 14336 💡 nLayers: 32 💡 nHeads: 32 💡 nKvHeads: 8 💡 vocabSize: 128256 💡 seqLen: 2048 💡 nSlices: 1 💡 ropeTheta: 500000.0 📄 bosId: 128000 📄 eosId: 128001 mmap succeeded. data = 0x7f8c7260a000 weights = 0x7f8c7260a060 🕒 ropeCache: 32768 kB ⏩ Loaded 6175568 kB đŸ”¶ G 421 ms I 421 ms T 0 ms S 0 kB R 0 kB Hello đŸ”¶ G 382 ms I 382 ms T 0 ms S 0 kB R 0 kB world đŸ”¶ G 421 ms I 420 ms T 0 ms S 0 kB R 0 kB ! đŸ”¶ G 385 ms I 384 ms T 0 ms S 0 kB R 0 kB This đŸ”¶ G 390 ms I 389 ms T 0 ms S 0 kB R 0 kB is đŸ”¶ G 377 ms I 377 ms T 0 ms S 0 kB R 0 kB a đŸ”¶ G 389 ms I 387 ms T 1 ms S 0 kB R 0 kB test đŸ”¶ G 395 ms I 395 ms T 0 ms S 0 kB R 0 kB of đŸ”¶ G 381 ms I 380 ms T 1 ms S 0 kB R 0 kB the đŸ”¶ G 376 ms I 374 ms T 1 ms S 0 kB R 0 kB emergency đŸ”¶ G 453 ms I 451 ms T 2 ms S 0 kB R 0 kB broadcast đŸ”¶ G 421 ms I 420 ms T 1 ms S 0 kB R 0 kB system đŸ”¶ G 423 ms I 421 ms T 1 ms S 0 kB R 0 kB .

unclemusclez commented 4 months ago
ubuntu@ubuntu:~/distributed-llama$ sudo nice -n -20 dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 2048 kB
⏩ Loaded 824584 kB
đŸ”¶ G  396 ms I  302 ms T   94 ms S 466351 kB R    654 kB Tell
đŸ”¶ G  419 ms I  329 ms T   89 ms S    654 kB R    654 kB  me
đŸ”¶ G  396 ms I  312 ms T   84 ms S    654 kB R    654 kB  about
đŸ”¶ G  380 ms I  304 ms T   76 ms S    654 kB R    654 kB  yourself
đŸ”¶ G  401 ms I  312 ms T   89 ms S    654 kB R    654 kB .
đŸ”¶ G  418 ms I  308 ms T  101 ms S    654 kB R    654 kB DD
đŸ”¶ G  395 ms I  313 ms T   73 ms S    654 kB R    654 kB CO
đŸ”¶ G  391 ms I  316 ms T   66 ms S    654 kB R    654 kB cou
đŸ”¶ G  295 ms I  208 ms T   83 ms S    654 kB R    654 kB ton
đŸ”¶ G  399 ms I  313 ms T   76 ms S    654 kB R    654 kB WN
đŸ”¶ G  393 ms I  306 ms T   76 ms S    654 kB R    654 kB TC
đŸ”¶ G  392 ms I  311 ms T   71 ms S    654 kB R    654 kB v
đŸ”¶ G  391 ms I  301 ms T   80 ms S    654 kB R    654 kB  i
đŸ”¶ G  390 ms I  312 ms T   69 ms S    654 kB R    654 kB D
đŸ”¶ G  398 ms I  304 ms T   83 ms S    654 kB R    654 kB ~
đŸ”¶ G  390 ms I  307 ms T   75 ms S    654 kB R    654 kB mk
đŸ”¶ G  397 ms I  306 ms T   82 ms S    654 kB R    654 kB Д
đŸ”¶ G  301 ms I  211 ms T   85 ms S    654 kB R    654 kB another
đŸ”¶ G  387 ms I  310 ms T   68 ms S    654 kB R    654 kB ti
đŸ”¶ G  392 ms I  311 ms T   73 ms S    654 kB R    654 kB ~
đŸ”¶ G  392 ms I  309 ms T   74 ms S    654 kB R    654 kB ~
đŸ”¶ G  395 ms I  308 ms T   78 ms S    654 kB R    654 kB D
đŸ”¶ G  393 ms I  305 ms T   77 ms S    654 kB R    654 kB ~
đŸ”¶ G  396 ms I  318 ms T   69 ms S    654 kB R    654 kB ~
đŸ”¶ G  390 ms I  304 ms T   78 ms S    654 kB R    654 kB ~
đŸ”¶ G  390 ms I  308 ms T   74 ms S    654 kB R    654 kB ~
đŸ”¶ G  299 ms I  221 ms T   74 ms S    654 kB R    654 kB of
đŸ”¶ G  382 ms I  300 ms T   74 ms S    654 kB R    654 kB  –
đŸ”¶ G  391 ms I  312 ms T   70 ms S    654 kB R    654 kB ~
đŸ”¶ G  390 ms I  304 ms T   77 ms S    654 kB R    654 kB K
đŸ”¶ G  390 ms I  307 ms T   75 ms S    654 kB R    654 kB ~
đŸ”¶ G  389 ms I  311 ms T   70 ms S    654 kB R    654 kB !
đŸ”¶ G  395 ms I  309 ms T   77 ms S    654 kB R    654 kB  properly
đŸ”¶ G  389 ms I  305 ms T   77 ms S    654 kB R    654 kB ~
đŸ”¶ G  391 ms I  306 ms T   76 ms S    654 kB R    654 kB ~
đŸ”¶ G  320 ms I  234 ms T   83 ms S    654 kB R    654 kB N
đŸ”¶ G  389 ms I  290 ms T   83 ms S    654 kB R    654 kB id
đŸ”¶ G  395 ms I  307 ms T   79 ms S    654 kB R    654 kB ~
đŸ”¶ G  391 ms I  307 ms T   75 ms S    654 kB R    654 kB redirect
đŸ”¶ G  388 ms I  308 ms T   73 ms S    654 kB R    654 kB ~
đŸ”¶ G  399 ms I  306 ms T   84 ms S    654 kB R    654 kB ~
đŸ”¶ G  395 ms I  306 ms T   80 ms S    654 kB R    654 kB NEW
đŸ”¶ G  398 ms I  304 ms T   84 ms S    654 kB R    654 kB ~
đŸ”¶ G  392 ms I  303 ms T   79 ms S    654 kB R    654 kB Mode
đŸ”¶ G  309 ms I  235 ms T   70 ms S    654 kB R    654 kB ~
đŸ”¶ G  385 ms I  287 ms T   89 ms S    654 kB R    654 kB userId
đŸ”¶ G  391 ms I  301 ms T   81 ms S    654 kB R    654 kB ~
đŸ”¶ G  397 ms I  301 ms T   87 ms S    654 kB R    654 kB Before
đŸ”¶ G  394 ms I  305 ms T   79 ms S    654 kB R    654 kB ----
đŸ”¶ G  508 ms I  426 ms T   72 ms S    654 kB R    654 kB ute
đŸ”¶ G  411 ms I  313 ms T   89 ms S    654 kB R    654 kB Dim
đŸ”¶ G  391 ms I  306 ms T   76 ms S    654 kB R    654 kB vern
đŸ”¶ G  392 ms I  303 ms T   80 ms S    654 kB R    654 kB 
đŸ”¶ G  367 ms I  258 ms T  100 ms S    654 kB R    654 kB udi
đŸ”¶ G  394 ms I  306 ms T   79 ms S    654 kB R    654 kB away
đŸ”¶ G  395 ms I  302 ms T   85 ms S    654 kB R    654 kB ~
đŸ”¶ G  393 ms I  305 ms T   80 ms S    654 kB R    654 kB ton
đŸ”¶ G  393 ms I  304 ms T   80 ms S    654 kB R    654 kB tocol
đŸ”¶ G  399 ms I  310 ms T   80 ms S    654 kB R    654 kB  coun
đŸ”¶ G  392 ms I  302 ms T   80 ms S    654 kB R    654 kB Counter
đŸ”¶ G  390 ms I  301 ms T   80 ms S    654 kB R    654 kB arts
đŸ”¶ G  391 ms I  304 ms T   78 ms S    654 kB R    654 kB A
đŸ”¶ G  374 ms I  259 ms T  107 ms S    654 kB R    654 kB ene
đŸ”¶ G  393 ms I  304 ms T   81 ms S    654 kB R    654 kB ~
Generated tokens:    64
Avg tokens / second: 2.58
Avg generation time: 387.80 ms
Avg inference time:  300.31 ms
Avg transfer time:   79.47 ms
DifferentialityDevelopment commented 4 months ago

Is your worker nodes also running the same version?

I pulled latest version from git, built from source, used downloader to download tinyllama and run as per the instructions and mine worked just fine, the only difference I could spot was that you were running using additional workers.

Possible reasons I could think of is that one or more nodes are running older versions of dllama, or some ARM specific code broke in a recent pull request, though I doubt that's the case.

The workflows test for functionality on both ARM and x86 processor architectures, though they don't exactly test the multiple worker functionality, it might be something that's broken only on multi node setup, or it could just be you didn't update the nodes to latest version..

unclemusclez commented 4 months ago

i compile on the 3b+ and then scp it to the other 3b+. i was downloading the tinyllama on my windows computer via WSL2 and converting it with the python env in there. the most recent time, which i justed post here, i used the python download script. i just rm on all the dllama executeables, and then re-scp'd the executable. same result.

ubuntu@ubuntu:~/distributed-llama$ sudo nice -n -20 dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998 
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 2048 kB
⏩ Loaded 824584 kB
đŸ”¶ G  401 ms I  315 ms T   86 ms S 466351 kB R    654 kB Tell
đŸ”¶ G  539 ms I  456 ms T   83 ms S    654 kB R    654 kB  me
đŸ”¶ G  436 ms I  325 ms T  111 ms S    654 kB R    654 kB  about
đŸ”¶ G  404 ms I  306 ms T   98 ms S    654 kB R    654 kB  yourself
đŸ”¶ G  384 ms I  306 ms T   77 ms S    654 kB R    654 kB .
đŸ”¶ G  392 ms I  304 ms T   78 ms S    654 kB R    654 kB fmt
đŸ”¶ G  391 ms I  305 ms T   77 ms S    654 kB R    654 kB ți
đŸ”¶ G  394 ms I  313 ms T   71 ms S    654 kB R    654 kB REE
đŸ”¶ G  373 ms I  269 ms T   94 ms S    654 kB R    654 kB int
đŸ”¶ G  391 ms I  309 ms T   72 ms S    654 kB R    654 kB DIS
đŸ”¶ G  397 ms I  320 ms T   67 ms S    654 kB R    654 kB  NUM
đŸ”¶ G  395 ms I  313 ms T   71 ms S    654 kB R    654 kB ART
đŸ”¶ G  396 ms I  310 ms T   75 ms S    654 kB R    654 kB  nad
đŸ”¶ G  395 ms I  304 ms T   81 ms S    654 kB R    654 kB  redirects
đŸ”¶ G  392 ms I  304 ms T   78 ms S    654 kB R    654 kB  qualified
đŸ”¶ G  393 ms I  305 ms T   79 ms S    654 kB R    654 kB help
đŸ”¶ G  365 ms I  280 ms T   80 ms S    654 kB R    654 kB  COUNT
đŸ”¶ G  376 ms I  271 ms T   94 ms S    654 kB R    654 kB is
đŸ”¶ G  394 ms I  309 ms T   75 ms S    654 kB R    654 kB T
đŸ”¶ G  396 ms I  312 ms T   76 ms S    654 kB R    654 kB npm
đŸ”¶ G  395 ms I  303 ms T   82 ms S    654 kB R    654 kB  -
đŸ”¶ G  393 ms I  310 ms T   74 ms S    654 kB R    654 kB noindent
đŸ”¶ G  391 ms I  309 ms T   73 ms S    654 kB R    654 kB ini
đŸ”¶ G  398 ms I  310 ms T   78 ms S    654 kB R    654 kB over
đŸ”¶ G  394 ms I  301 ms T   83 ms S    654 kB R    654 kB  \\
đŸ”¶ G  336 ms I  254 ms T   79 ms S    654 kB R    654 kB ve
đŸ”¶ G  379 ms I  291 ms T   77 ms S    654 kB R    654 kB  so
đŸ”¶ G  395 ms I  305 ms T   80 ms S    654 kB R    654 kB  cer
đŸ”¶ G  394 ms I  312 ms T   71 ms S    654 kB R    654 kB ĐČ
đŸ”¶ G  394 ms I  311 ms T   73 ms S    654 kB R    654 kB ~
đŸ”¶ G  394 ms I  294 ms T   91 ms S    654 kB R    654 kB on
đŸ”¶ G  395 ms I  300 ms T   84 ms S    654 kB R    654 kB ~
đŸ”¶ G  394 ms I  304 ms T   81 ms S    654 kB R    654 kB urale
đŸ”¶ G  394 ms I  308 ms T   75 ms S    654 kB R    654 kB ivers
đŸ”¶ G  324 ms I  243 ms T   77 ms S    654 kB R    654 kB jud
đŸ”¶ G  384 ms I  292 ms T   82 ms S    654 kB R    654 kB ute
đŸ”¶ G  399 ms I  316 ms T   73 ms S    654 kB R    654 kB --
đŸ”¶ G  392 ms I  306 ms T   77 ms S    654 kB R    654 kB ___
đŸ”¶ G  391 ms I  308 ms T   74 ms S    654 kB R    654 kB ~
đŸ”¶ G  395 ms I  302 ms T   84 ms S    654 kB R    654 kB ___
đŸ”¶ G  393 ms I  302 ms T   82 ms S    654 kB R    654 kB w
đŸ”¶ G  393 ms I  310 ms T   73 ms S    654 kB R    654 kB right
đŸ”¶ G  394 ms I  311 ms T   73 ms S    654 kB R    654 kB is
đŸ”¶ G  317 ms I  234 ms T   79 ms S    654 kB R    654 kB ˚
đŸ”¶ G  382 ms I  294 ms T   78 ms S    654 kB R    654 kB where
đŸ”¶ G  400 ms I  311 ms T   79 ms S    654 kB R    654 kB head
đŸ”¶ G  394 ms I  307 ms T   77 ms S    654 kB R    654 kB __
đŸ”¶ G  396 ms I  304 ms T   83 ms S    654 kB R    654 kB ----
đŸ”¶ G  395 ms I  305 ms T   80 ms S    654 kB R    654 kB ─
đŸ”¶ G  401 ms I  317 ms T   73 ms S    654 kB R    654 kB  `-
đŸ”¶ G  394 ms I  309 ms T   75 ms S    654 kB R    654 kB li
đŸ”¶ G  395 ms I  309 ms T   76 ms S    654 kB R    654 kB  from
đŸ”¶ G  307 ms I  220 ms T   83 ms S    654 kB R    654 kB __
đŸ”¶ G  384 ms I  298 ms T   77 ms S    654 kB R    654 kB idente
đŸ”¶ G  393 ms I  307 ms T   76 ms S    654 kB R    654 kB gen
đŸ”¶ G  395 ms I  315 ms T   70 ms S    654 kB R    654 kB wedge
đŸ”¶ G  394 ms I  314 ms T   71 ms S    654 kB R    654 kB unic
đŸ”¶ G  394 ms I  315 ms T   70 ms S    654 kB R    654 kB dim
đŸ”¶ G  394 ms I  307 ms T   77 ms S    654 kB R    654 kB weis
đŸ”¶ G  396 ms I  310 ms T   77 ms S    654 kB R    654 kB ligen
đŸ”¶ G  395 ms I  301 ms T   84 ms S    654 kB R    654 kB Ăș
đŸ”¶ G  304 ms I  224 ms T   76 ms S    654 kB R    654 kB wid
đŸ”¶ G  389 ms I  301 ms T   79 ms S    654 kB R    654 kB ute
đŸ”¶ G  396 ms I  309 ms T   78 ms S    654 kB R    654 kB w
Generated tokens:    64
Avg tokens / second: 2.57
Avg generation time: 389.53 ms
Avg inference time:  302.33 ms
Avg transfer time:   78.70 ms
DifferentialityDevelopment commented 4 months ago

That's so strange, I just did a test with multiple workers, running from the same machine instead of multiple machines, though it's x86 and not ARM.

Root: sudo nice -n 20 ./dllama inference --model ~/distributed-llama/models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer ~/distributed-llama/models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Love is" --workers 127.0.0.1:11211 💡 arch: llama 💡 hiddenAct: silu 💡 dim: 2048 💡 hiddenDim: 5632 💡 nLayers: 22 💡 nHeads: 32 💡 nKvHeads: 4 💡 vocabSize: 32000 💡 seqLen: 2048 💡 nSlices: 2 💡 ropeTheta: 10000.0 📄 bosId: 1 📄 eosId: 2 🕒 ropeCache: 8192 kB ⏩ Loaded 824584 kB đŸ”¶ G 63 ms I 42 ms T 21 ms S 266205 kB R 93 kB Love đŸ”¶ G 72 ms I 41 ms T 30 ms S 93 kB R 93 kB is đŸ”¶ G 73 ms I 41 ms T 32 ms S 93 kB R 93 kB Fore đŸ”¶ G 61 ms I 32 ms T 29 ms S 93 kB R 93 kB ver đŸ”¶ G 63 ms I 40 ms T 22 ms S 93 kB R 93 kB , đŸ”¶ G 61 ms I 42 ms T 19 ms S 93 kB R 93 kB I đŸ”¶ G 59 ms I 38 ms T 21 ms S 93 kB R 93 kB Can đŸ”¶ G 74 ms I 42 ms T 32 ms S 93 kB R 93 kB Only đŸ”¶ G 70 ms I 41 ms T 28 ms S 93 kB R 93 kB Im đŸ”¶ G 73 ms I 36 ms T 36 ms S 93 kB R 93 kB agine đŸ”¶ G 66 ms I 46 ms T 19 ms S 93 kB R 93 kB , đŸ”¶ G 63 ms I 36 ms T 26 ms S 93 kB R 93 kB Jo đŸ”¶ G 63 ms I 41 ms T 21 ms S 93 kB R 93 kB Jo đŸ”¶ G 59 ms I 40 ms T 19 ms S 93 kB R 93 kB Gun đŸ”¶ G 56 ms I 32 ms T 23 ms S 93 kB R 93 kB ne đŸ”¶ G 59 ms I 34 ms T 25 ms S 93 kB R 93 kB , đŸ”¶ G 69 ms I 33 ms T 35 ms S 93 kB R 93 kB Jer đŸ”¶ G 70 ms I 33 ms T 37 ms S 93 kB R 93 kB emy đŸ”¶ G 73 ms I 32 ms T 41 ms S 93 kB R 93 kB Camp đŸ”¶ G 77 ms I 41 ms T 36 ms S 93 kB R 93 kB , đŸ”¶ G 68 ms I 41 ms T 26 ms S 93 kB R 93 kB K đŸ”¶ G 72 ms I 39 ms T 33 ms S 93 kB R 93 kB aty đŸ”¶ G 75 ms I 37 ms T 38 ms S 93 kB R 93 kB Perry đŸ”¶ G 77 ms I 40 ms T 37 ms S 93 kB R 93 kB , đŸ”¶ G 77 ms I 42 ms T 34 ms S 93 kB R 93 kB Kid đŸ”¶ G 75 ms I 37 ms T 38 ms S 93 kB R 93 kB Rock đŸ”¶ G 78 ms I 42 ms T 35 ms S 93 kB R 93 kB , đŸ”¶ G 82 ms I 41 ms T 40 ms S 93 kB R 93 kB Lady đŸ”¶ G 82 ms I 42 ms T 40 ms S 93 kB R 93 kB An đŸ”¶ G 70 ms I 40 ms T 30 ms S 93 kB R 93 kB te đŸ”¶ G 74 ms I 39 ms T 35 ms S 93 kB R 93 kB bell đŸ”¶ G 69 ms I 43 ms T 26 ms S 93 kB R 93 kB um

Worker: ./dllama worker --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --port 11211

Both running from the same machine inside WSL.

I unfortunately don't have any ARM hardware to test with currently, but it could be related to that.

DifferentialityDevelopment commented 4 months ago

Another test

sudo nice -n 20 ./dllama inference --model ~/distributed-llama/models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer ~/distributed-llama/models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Python is a programming language that" --workers 127.0.0.1:11211 💡 arch: llama 💡 hiddenAct: silu 💡 dim: 2048 💡 hiddenDim: 5632 💡 nLayers: 22 💡 nHeads: 32 💡 nKvHeads: 4 💡 vocabSize: 32000 💡 seqLen: 2048 💡 nSlices: 2 💡 ropeTheta: 10000.0 📄 bosId: 1 📄 eosId: 2 🕒 ropeCache: 8192 kB ⏩ Loaded 824584 kB đŸ”¶ G 77 ms I 34 ms T 43 ms S 266205 kB R 93 kB Python đŸ”¶ G 72 ms I 29 ms T 43 ms S 93 kB R 93 kB is đŸ”¶ G 67 ms I 38 ms T 29 ms S 93 kB R 93 kB a đŸ”¶ G 75 ms I 40 ms T 35 ms S 93 kB R 93 kB programming đŸ”¶ G 65 ms I 32 ms T 33 ms S 93 kB R 93 kB language đŸ”¶ G 68 ms I 40 ms T 28 ms S 93 kB R 93 kB that đŸ”¶ G 71 ms I 39 ms T 32 ms S 93 kB R 93 kB is đŸ”¶ G 59 ms I 42 ms T 17 ms S 93 kB R 93 kB open đŸ”¶ G 67 ms I 30 ms T 37 ms S 93 kB R 93 kB source đŸ”¶ G 70 ms I 34 ms T 35 ms S 93 kB R 93 kB and đŸ”¶ G 57 ms I 43 ms T 14 ms S 93 kB R 93 kB free đŸ”¶ G 64 ms I 46 ms T 18 ms S 93 kB R 93 kB to đŸ”¶ G 59 ms I 46 ms T 13 ms S 93 kB R 93 kB use đŸ”¶ G 59 ms I 38 ms T 21 ms S 93 kB R 93 kB . đŸ”¶ G 61 ms I 47 ms T 14 ms S 93 kB R 93 kB It đŸ”¶ G 65 ms I 35 ms T 30 ms S 93 kB R 93 kB is đŸ”¶ G 68 ms I 42 ms T 25 ms S 93 kB R 93 kB designed đŸ”¶ G 61 ms I 38 ms T 23 ms S 93 kB R 93 kB for đŸ”¶ G 65 ms I 46 ms T 19 ms S 93 kB R 93 kB ease đŸ”¶ G 61 ms I 37 ms T 24 ms S 93 kB R 93 kB of đŸ”¶ G 75 ms I 33 ms T 42 ms S 93 kB R 93 kB use đŸ”¶ G 71 ms I 38 ms T 33 ms S 93 kB R 93 kB , đŸ”¶ G 68 ms I 30 ms T 38 ms S 93 kB R 93 kB flex đŸ”¶ G 72 ms I 36 ms T 36 ms S 93 kB R 93 kB ibility đŸ”¶ G 73 ms I 38 ms T 35 ms S 93 kB R 93 kB and đŸ”¶ G 71 ms I 40 ms T 30 ms S 93 kB R 93 kB efficiency đŸ”¶ G 69 ms I 34 ms T 35 ms S 93 kB R 93 kB .

I'm going to check if I can spin up a VM on azure to test out if it's maybe an ARM specific issue.

unclemusclez commented 4 months ago

WSL HOST:

musclez@NSA:~/distributed-llama$ sudo nice -n -20 ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
[sudo] password for musclez:
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 2048 kB
⏩ Loaded 824584 kB
đŸ”¶ G  240 ms I   11 ms T  229 ms S 466351 kB R    654 kB Tell
đŸ”¶ G  197 ms I   16 ms T  181 ms S    654 kB R    654 kB  me
đŸ”¶ G  223 ms I   13 ms T  210 ms S    654 kB R    654 kB  about
đŸ”¶ G  221 ms I   12 ms T  209 ms S    654 kB R    654 kB  yourself
đŸ”¶ G  219 ms I   10 ms T  209 ms S    654 kB R    654 kB .
đŸ”¶ G  235 ms I   11 ms T  223 ms S    654 kB R    654 kB rows
đŸ”¶ G  232 ms I   12 ms T  219 ms S    654 kB R    654 kB otti
đŸ”¶ G  232 ms I    9 ms T  223 ms S    654 kB R    654 kB where
đŸ”¶ G  266 ms I   15 ms T  250 ms S    654 kB R    654 kB otti
đŸ”¶ G  202 ms I   12 ms T  189 ms S    654 kB R    654 kB ti
đŸ”¶ G  197 ms I   13 ms T  183 ms S    654 kB R    654 kB ining
đŸ”¶ G  203 ms I   13 ms T  189 ms S    654 kB R    654 kB >
đŸ”¶ G  195 ms I   10 ms T  183 ms S    654 kB R    654 kB uden
đŸ”¶ G  199 ms I   12 ms T  187 ms S    654 kB R    654 kB  there
đŸ”¶ G  203 ms I   12 ms T  190 ms S    654 kB R    654 kB ered
đŸ”¶ G  214 ms I   12 ms T  201 ms S    654 kB R    654 kB COM
đŸ”¶ G  207 ms I    8 ms T  198 ms S    654 kB R    654 kB otti
đŸ”¶ G  210 ms I   10 ms T  199 ms S    654 kB R    654 kB otti
đŸ”¶ G  213 ms I   11 ms T  202 ms S    654 kB R    654 kB  Overflow
đŸ”¶ G  211 ms I   15 ms T  196 ms S    654 kB R    654 kB nav
đŸ”¶ G  213 ms I   13 ms T  199 ms S    654 kB R    654 kB nav
đŸ”¶ G  195 ms I   13 ms T  180 ms S    654 kB R    654 kB isti
đŸ”¶ G  204 ms I   11 ms T  191 ms S    654 kB R    654 kB  enough
đŸ”¶ G  222 ms I    9 ms T  211 ms S    654 kB R    654 kB  sigu
đŸ”¶ G  221 ms I   18 ms T  200 ms S    654 kB R    654 kB  Beginn
đŸ”¶ G  218 ms I   15 ms T  202 ms S    654 kB R    654 kB ani
đŸ”¶ G  220 ms I   14 ms T  205 ms S    654 kB R    654 kB  Overflow
đŸ”¶ G  198 ms I   12 ms T  185 ms S    654 kB R    654 kB otti
đŸ”¶ G  205 ms I   15 ms T  189 ms S    654 kB R    654 kB  Jazz
đŸ”¶ G  206 ms I   10 ms T  195 ms S    654 kB R    654 kB nu
đŸ”¶ G  197 ms I   11 ms T  186 ms S    654 kB R    654 kB Đ»ĐžĐŒĐżĐž
đŸ”¶ G  200 ms I   13 ms T  185 ms S    654 kB R    654 kB otti
đŸ”¶ G  194 ms I    9 ms T  184 ms S    654 kB R    654 kB  Overflow
đŸ”¶ G  204 ms I   11 ms T  191 ms S    654 kB R    654 kB {}
đŸ”¶ G  207 ms I   14 ms T  192 ms S    654 kB R    654 kB gen
đŸ”¶ G  216 ms I   18 ms T  197 ms S    654 kB R    654 kB  Overflow
đŸ”¶ G  260 ms I   14 ms T  245 ms S    654 kB R    654 kB otti
đŸ”¶ G  217 ms I    9 ms T  207 ms S    654 kB R    654 kB atti
đŸ”¶ G  219 ms I   15 ms T  203 ms S    654 kB R    654 kB  Frei
đŸ”¶ G  207 ms I   12 ms T  194 ms S    654 kB R    654 kB dk
đŸ”¶ G  232 ms I   12 ms T  219 ms S    654 kB R    654 kB  Overflow
đŸ”¶ G  213 ms I   10 ms T  203 ms S    654 kB R    654 kB  Gar
đŸ”¶ G  223 ms I   16 ms T  206 ms S    654 kB R    654 kB  Overflow
đŸ”¶ G  199 ms I   14 ms T  184 ms S    654 kB R    654 kB  Gib
đŸ”¶ G  215 ms I    9 ms T  205 ms S    654 kB R    654 kB  Hunter
đŸ”¶ G  222 ms I   10 ms T  211 ms S    654 kB R    654 kB Ășn
đŸ”¶ G  220 ms I    9 ms T  209 ms S    654 kB R    654 kB agu
đŸ”¶ G  220 ms I   16 ms T  203 ms S    654 kB R    654 kB  Government
đŸ”¶ G  205 ms I   10 ms T  194 ms S    654 kB R    654 kB  Overflow
đŸ”¶ G  196 ms I    9 ms T  186 ms S    654 kB R    654 kB otto
đŸ”¶ G  198 ms I   11 ms T  186 ms S    654 kB R    654 kB amps
đŸ”¶ G  222 ms I   10 ms T  211 ms S    654 kB R    654 kB  Overflow
đŸ”¶ G  200 ms I   18 ms T  180 ms S    654 kB R    654 kB  Overflow
đŸ”¶ G  195 ms I   11 ms T  183 ms S    654 kB R    654 kB  Name
đŸ”¶ G  200 ms I   12 ms T  187 ms S    654 kB R    654 kB  vis
đŸ”¶ G  209 ms I   11 ms T  197 ms S    654 kB R    654 kB  Jenkins
đŸ”¶ G  237 ms I   12 ms T  224 ms S    654 kB R    654 kB app
đŸ”¶ G  205 ms I   19 ms T  185 ms S    654 kB R    654 kB  Party
đŸ”¶ G  195 ms I   11 ms T  184 ms S    654 kB R    654 kB amps
đŸ”¶ G  209 ms I   12 ms T  196 ms S    654 kB R    654 kB  Overflow
đŸ”¶ G  202 ms I   15 ms T  186 ms S    654 kB R    654 kB  Overflow
đŸ”¶ G  212 ms I   10 ms T  201 ms S    654 kB R    654 kB Overflow
đŸ”¶ G  193 ms I   14 ms T  178 ms S    654 kB R    654 kB quipe
đŸ”¶ G  206 ms I   14 ms T  191 ms S    654 kB R    654 kB utes
Generated tokens:    64
Avg tokens / second: 4.72
Avg generation time: 212.03 ms
Avg inference time:  12.31 ms
Avg transfer time:   198.75 ms
DifferentialityDevelopment commented 4 months ago

I just created an EC2 ARM VM, and ran the same test there, worked perfectly fine. So the issue doesn't seem to be ARM specific at the very least. Not quite sure what is going on..

DifferentialityDevelopment commented 4 months ago

Perhaps try just the WSL root node, then add workers 1 at a time, perhaps it's a problem with a single worker that's affecting the others, either way something strange is going on.

unclemusclez commented 4 months ago

4 Work, 8 Do not. This was the same with WSL as the inference and the pi as the inference.

On WSL however, you can see that it's actually saying "overflow", when 8 are run. intriguing.

from above:

đŸ”¶ G  209 ms I   12 ms T  196 ms S    654 kB R    654 kB  Overflow
đŸ”¶ G  202 ms I   15 ms T  186 ms S    654 kB R    654 kB  Overflow
đŸ”¶ G  212 ms I   10 ms T  201 ms S    654 kB R    654 kB Overflow

4x working:

 sudo nice -n -20 ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 4
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 4096 kB
⏩ Loaded 824584 kB
đŸ”¶ G  352 ms I   13 ms T  339 ms S 399448 kB R    280 kB Tell
đŸ”¶ G  323 ms I   12 ms T  311 ms S    280 kB R    280 kB  me
đŸ”¶ G  371 ms I   18 ms T  353 ms S    280 kB R    280 kB  about
đŸ”¶ G  344 ms I   21 ms T  322 ms S    280 kB R    280 kB  yourself
đŸ”¶ G  337 ms I   14 ms T  323 ms S    280 kB R    280 kB .
đŸ”¶ G  365 ms I   19 ms T  346 ms S    280 kB R    280 kB

đŸ”¶ G  353 ms I   14 ms T  339 ms S    280 kB R    280 kB NO
đŸ”¶ G  358 ms I   12 ms T  346 ms S    280 kB R    280 kB W
đŸ”¶ G  315 ms I   17 ms T  298 ms S    280 kB R    280 kB  A
đŸ”¶ G  344 ms I   15 ms T  329 ms S    280 kB R    280 kB BO
đŸ”¶ G  336 ms I   20 ms T  316 ms S    280 kB R    280 kB UT
đŸ”¶ G  364 ms I   12 ms T  352 ms S    280 kB R    280 kB  Y
đŸ”¶ G  336 ms I   14 ms T  322 ms S    280 kB R    280 kB OU
đŸ”¶ G  350 ms I   19 ms T  331 ms S    280 kB R    280 kB :
đŸ”¶ G  347 ms I   13 ms T  333 ms S    280 kB R    280 kB  What
đŸ”¶ G  369 ms I   13 ms T  356 ms S    280 kB R    280 kB  was
đŸ”¶ G  350 ms I   17 ms T  333 ms S    280 kB R    280 kB  your
đŸ”¶ G  404 ms I   16 ms T  388 ms S    280 kB R    280 kB  first
đŸ”¶ G  338 ms I   15 ms T  323 ms S    280 kB R    280 kB  job
đŸ”¶ G  319 ms I   14 ms T  305 ms S    280 kB R    280 kB ?
đŸ”¶ G  436 ms I   19 ms T  416 ms S    280 kB R    280 kB

đŸ”¶ G  336 ms I   22 ms T  314 ms S    280 kB R    280 kB It
đŸ”¶ G  328 ms I   16 ms T  312 ms S    280 kB R    280 kB  was
đŸ”¶ G  362 ms I   16 ms T  346 ms S    280 kB R    280 kB  a
đŸ”¶ G  342 ms I   15 ms T  327 ms S    280 kB R    280 kB  ret
đŸ”¶ G  337 ms I   14 ms T  323 ms S    280 kB R    280 kB ail
đŸ”¶ G  395 ms I   19 ms T  375 ms S    280 kB R    280 kB  job
đŸ”¶ G  343 ms I   18 ms T  325 ms S    280 kB R    280 kB ,
đŸ”¶ G  345 ms I   16 ms T  329 ms S    280 kB R    280 kB  but
đŸ”¶ G  392 ms I   20 ms T  372 ms S    280 kB R    280 kB  I
đŸ”¶ G  330 ms I   14 ms T  315 ms S    280 kB R    280 kB  was
đŸ”¶ G  401 ms I   16 ms T  385 ms S    280 kB R    280 kB  always
đŸ”¶ G  355 ms I   23 ms T  332 ms S    280 kB R    280 kB  interested
đŸ”¶ G  369 ms I   17 ms T  351 ms S    280 kB R    280 kB  in
đŸ”¶ G  409 ms I   18 ms T  390 ms S    280 kB R    280 kB  writing
đŸ”¶ G  349 ms I   15 ms T  334 ms S    280 kB R    280 kB .
đŸ”¶ G  344 ms I   17 ms T  327 ms S    280 kB R    280 kB  I
đŸ”¶ G  436 ms I   12 ms T  424 ms S    280 kB R    280 kB  read
đŸ”¶ G  333 ms I   14 ms T  319 ms S    280 kB R    280 kB  lots
đŸ”¶ G  350 ms I   18 ms T  331 ms S    280 kB R    280 kB  of
đŸ”¶ G  362 ms I   13 ms T  348 ms S    280 kB R    280 kB  books
đŸ”¶ G  359 ms I   18 ms T  341 ms S    280 kB R    280 kB  and
đŸ”¶ G  428 ms I   18 ms T  410 ms S    280 kB R    280 kB  went
đŸ”¶ G  331 ms I   15 ms T  316 ms S    280 kB R    280 kB  to
đŸ”¶ G  356 ms I   15 ms T  341 ms S    280 kB R    280 kB  university
đŸ”¶ G  383 ms I   20 ms T  363 ms S    280 kB R    280 kB  to
đŸ”¶ G  325 ms I   16 ms T  309 ms S    280 kB R    280 kB  do
đŸ”¶ G  359 ms I   12 ms T  347 ms S    280 kB R    280 kB  a
đŸ”¶ G  365 ms I   16 ms T  349 ms S    280 kB R    280 kB  B
đŸ”¶ G  322 ms I   15 ms T  306 ms S    280 kB R    280 kB A
đŸ”¶ G  349 ms I   19 ms T  330 ms S    280 kB R    280 kB  in
đŸ”¶ G  409 ms I   21 ms T  388 ms S    280 kB R    280 kB  English
đŸ”¶ G  330 ms I   14 ms T  316 ms S    280 kB R    280 kB .
đŸ”¶ G  356 ms I   13 ms T  343 ms S    280 kB R    280 kB

đŸ”¶ G  373 ms I   18 ms T  355 ms S    280 kB R    280 kB HO
đŸ”¶ G  317 ms I   14 ms T  302 ms S    280 kB R    280 kB W
đŸ”¶ G  398 ms I   14 ms T  384 ms S    280 kB R    280 kB  W
đŸ”¶ G  347 ms I   15 ms T  332 ms S    280 kB R    280 kB AS
đŸ”¶ G  332 ms I   14 ms T  318 ms S    280 kB R    280 kB  IT
đŸ”¶ G  388 ms I   22 ms T  366 ms S    280 kB R    280 kB  ME
đŸ”¶ G  349 ms I   17 ms T  332 ms S    280 kB R    280 kB ET
đŸ”¶ G  324 ms I   19 ms T  305 ms S    280 kB R    280 kB ING
đŸ”¶ G  358 ms I   13 ms T  345 ms S    280 kB R    280 kB  Y
đŸ”¶ G  345 ms I   18 ms T  327 ms S    280 kB R    280 kB OUR
Generated tokens:    64
Avg tokens / second: 2.80
Avg generation time: 356.75 ms
Avg inference time:  16.19 ms
Avg transfer time:   340.39 ms
ubuntu@ubuntu:~/distributed-llama$ sudo nice -n -20 ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 4
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 4096 kB
⏩ Loaded 824584 kB
đŸ”¶ G  506 ms I  412 ms T   94 ms S 399448 kB R    280 kB Tell
đŸ”¶ G  543 ms I  458 ms T   85 ms S    280 kB R    280 kB  me
đŸ”¶ G  486 ms I  432 ms T   54 ms S    280 kB R    280 kB  about
đŸ”¶ G  486 ms I  424 ms T   62 ms S    280 kB R    280 kB  yourself
đŸ”¶ G  488 ms I  428 ms T   60 ms S    280 kB R    280 kB .
đŸ”¶ G  493 ms I  426 ms T   62 ms S    280 kB R    280 kB

đŸ”¶ G  437 ms I  373 ms T   59 ms S    280 kB R    280 kB Over
đŸ”¶ G  486 ms I  399 ms T   82 ms S    280 kB R    280 kB all
đŸ”¶ G  525 ms I  425 ms T   95 ms S    280 kB R    280 kB ,
đŸ”¶ G  497 ms I  421 ms T   70 ms S    280 kB R    280 kB  I
đŸ”¶ G  489 ms I  426 ms T   59 ms S    280 kB R    280 kB  want
đŸ”¶ G  489 ms I  431 ms T   53 ms S    280 kB R    280 kB  to
đŸ”¶ G  494 ms I  434 ms T   55 ms S    280 kB R    280 kB  create
đŸ”¶ G  452 ms I  389 ms T   61 ms S    280 kB R    280 kB  a
đŸ”¶ G  486 ms I  395 ms T   85 ms S    280 kB R    280 kB  product
đŸ”¶ G  489 ms I  425 ms T   60 ms S    280 kB R    280 kB  that
đŸ”¶ G  491 ms I  427 ms T   59 ms S    280 kB R    280 kB  allows
đŸ”¶ G  489 ms I  423 ms T   61 ms S    280 kB R    280 kB  people
đŸ”¶ G  492 ms I  429 ms T   58 ms S    280 kB R    280 kB  to
đŸ”¶ G  492 ms I  433 ms T   54 ms S    280 kB R    280 kB  eng
đŸ”¶ G  487 ms I  425 ms T   60 ms S    280 kB R    280 kB age
đŸ”¶ G  482 ms I  377 ms T  101 ms S    280 kB R    280 kB  with
đŸ”¶ G  491 ms I  424 ms T   62 ms S    280 kB R    280 kB  nature
đŸ”¶ G  491 ms I  429 ms T   57 ms S    280 kB R    280 kB  and
đŸ”¶ G  492 ms I  430 ms T   57 ms S    280 kB R    280 kB  have
đŸ”¶ G  491 ms I  426 ms T   60 ms S    280 kB R    280 kB  a
đŸ”¶ G  490 ms I  429 ms T   57 ms S    280 kB R    280 kB  real
đŸ”¶ G  490 ms I  428 ms T   57 ms S    280 kB R    280 kB  connection
đŸ”¶ G  481 ms I  373 ms T  104 ms S    280 kB R    280 kB  with
đŸ”¶ G  498 ms I  432 ms T   62 ms S    280 kB R    280 kB  the
đŸ”¶ G  496 ms I  439 ms T   53 ms S    280 kB R    280 kB  out
đŸ”¶ G  491 ms I  430 ms T   56 ms S    280 kB R    280 kB do
đŸ”¶ G  490 ms I  434 ms T   51 ms S    280 kB R    280 kB ors
đŸ”¶ G  496 ms I  440 ms T   52 ms S    280 kB R    280 kB .
đŸ”¶ G  490 ms I  431 ms T   54 ms S    280 kB R    280 kB 

đŸ”¶ G  482 ms I  380 ms T   97 ms S    280 kB R    280 kB My
đŸ”¶ G  496 ms I  426 ms T   65 ms S    280 kB R    280 kB  main
đŸ”¶ G  492 ms I  426 ms T   61 ms S    280 kB R    280 kB  goal
đŸ”¶ G  491 ms I  431 ms T   56 ms S    280 kB R    280 kB  for
đŸ”¶ G  492 ms I  430 ms T   57 ms S    280 kB R    280 kB  the
đŸ”¶ G  498 ms I  430 ms T   63 ms S    280 kB R    280 kB  next
đŸ”¶ G  490 ms I  427 ms T   59 ms S    280 kB R    280 kB  year
đŸ”¶ G  481 ms I  374 ms T  103 ms S    280 kB R    280 kB  is
đŸ”¶ G  491 ms I  430 ms T   57 ms S    280 kB R    280 kB  to
đŸ”¶ G  491 ms I  427 ms T   59 ms S    280 kB R    280 kB  work
đŸ”¶ G  490 ms I  424 ms T   62 ms S    280 kB R    280 kB  on
đŸ”¶ G  491 ms I  429 ms T   57 ms S    280 kB R    280 kB  the
đŸ”¶ G  493 ms I  435 ms T   52 ms S    280 kB R    280 kB  R
đŸ”¶ G  492 ms I  431 ms T   56 ms S    280 kB R    280 kB ise
đŸ”¶ G  485 ms I  375 ms T  105 ms S    280 kB R    280 kB  +
đŸ”¶ G  489 ms I  429 ms T   55 ms S    280 kB R    280 kB  Fl
đŸ”¶ G  491 ms I  432 ms T   55 ms S    280 kB R    280 kB ight
đŸ”¶ G  494 ms I  435 ms T   53 ms S    280 kB R    280 kB  brand
đŸ”¶ G  496 ms I  444 ms T   48 ms S    280 kB R    280 kB .
đŸ”¶ G  492 ms I  428 ms T   60 ms S    280 kB R    280 kB  I
đŸ”¶ G  491 ms I  429 ms T   58 ms S    280 kB R    280 kB  want
đŸ”¶ G  487 ms I  374 ms T  109 ms S    280 kB R    280 kB  to
đŸ”¶ G  492 ms I  435 ms T   53 ms S    280 kB R    280 kB  create
đŸ”¶ G  492 ms I  428 ms T   60 ms S    280 kB R    280 kB  a
đŸ”¶ G  496 ms I  430 ms T   61 ms S    280 kB R    280 kB  brand
đŸ”¶ G  497 ms I  433 ms T   60 ms S    280 kB R    280 kB  that
đŸ”¶ G  493 ms I  431 ms T   57 ms S    280 kB R    280 kB  allows
đŸ”¶ G  493 ms I  436 ms T   52 ms S    280 kB R    280 kB  people
đŸ”¶ G  483 ms I  372 ms T  106 ms S    280 kB R    280 kB  to
Generated tokens:    64
Avg tokens / second: 2.04
Avg generation time: 490.73 ms
Avg inference time:  421.38 ms
Avg transfer time:   65.11 ms
b4rtaz commented 4 months ago

Could you try to run 8 workers but with a single thread? --nthreads 1?

DifferentialityDevelopment commented 4 months ago

He could also try running funcs-test on all the Pi's

b4rtaz commented 4 months ago

I reproduced the problem. 8 nodes with 4 threads generate a spaggetti. I'll look at this.

⏩ Loaded 824584 kB
đŸ”¶ G 8052 ms I 4891 ms T 3161 ms S 466351 kB R    654 kB Hello
đŸ”¶ G 6765 ms I 4108 ms T 2657 ms S    654 kB R    654 kB  world
đŸ”¶ G 11431 ms I 7125 ms T 4306 ms S    654 kB R    654 kB !
đŸ”¶ G 10778 ms I 6435 ms T 4342 ms S    654 kB R    654 kB m
đŸ”¶ G 10806 ms I 6676 ms T 4130 ms S    654 kB R    654 kB row
đŸ”¶ G 12481 ms I 6907 ms T 5573 ms S    654 kB R    654 kB M
đŸ”¶ G 11464 ms I 6865 ms T 4598 ms S    654 kB R    654 kB NO

Update: The same is with 8 nodes with 1 thread:

đŸ”¶ G   62 ms I   43 ms T   19 ms S 466351 kB R    654 kB Hello
đŸ”¶ G   51 ms I   35 ms T   16 ms S    654 kB R    654 kB  world
đŸ”¶ G   46 ms I   34 ms T   12 ms S    654 kB R    654 kB !
đŸ”¶ G   48 ms I   38 ms T   10 ms S    654 kB R    654 kB  Dev
đŸ”¶ G   49 ms I   31 ms T   18 ms S    654 kB R    654 kB ori
đŸ”¶ G   50 ms I   36 ms T   13 ms S    654 kB R    654 kB IC
đŸ”¶ G   46 ms I   41 ms T    5 ms S    654 kB R    654 kB M
đŸ”¶ G   43 ms I   33 ms T   10 ms S    654 kB R    654 kB  to
đŸ”¶ G   46 ms I   33 ms T   12 ms S    654 kB R    654 kB web
đŸ”¶ G   49 ms I   38 ms T   11 ms S    654 kB R    654 kB +
đŸ”¶ G   52 ms I   32 ms T   20 ms S    654 kB R    654 kB small

Update: This problem appears with TinyLlama. Llama 3 8B works ok.

unclemusclez commented 4 months ago

https://huggingface.co/keeeeenw/MicroLlama/tree/main

i was looking into this but there is no tokenizer.model. I don't know enough about conversion yet. I see we're looking for the HF llama that use the sentencepiece tokenizer. that, or llama3 models.

If there was some external documentation i could refer to i would try to work with some other lightweight models that might work with the 1GB of memory.

I just got some 2GB SBCs in the mail, so i could try to mix and match a bit to allow the memory demands of Llama3. I also may try to just use the TinyLlama with 4 Pi's. That worked, so i don't really need 8.

b4rtaz commented 4 months ago

@unclemusclez the mystery is solved. The TinyLama has nKvHeads=4 so this is the maximum amount of nodes now. Later I'll add some error message to the app.