Closed b4rtaz closed 1 month ago
nTokens = 90, buffer = Q80
4 x Rasperry Pi 5 8GB
Version | Avg tokens / second | Avg generation time | Avg inference time | Avg transfer time |
---|---|---|---|---|
This PR | 4.08 | 245.08 ms | 169.33 ms | 75.34 ms |
0.7.0 | 3.90 | 256.23 ms | 168.77 ms | 87.12 ms |
0.6.0 | 4.24 | 235.69 ms | 143.44 ms | 91.77 ms |
2 x Rasperry Pi 5 8GB
Version | Avg tokens / second | Avg generation time | Avg inference time | Avg transfer time |
---|---|---|---|---|
This PR | 3.07 | 325.46 ms | 269.04 ms | 56.39 ms |
0.7.0 | 2.91 | 343.44 ms | 266.51 ms | 76.87 ms |
0.6.0 | 3.06 | 327.17 ms | 249.80 ms | 77.28 ms |
nTokens = 128, buffer = Q80
2 x Rasperry Pi 5 8GB
Version | Avg tokens / second | Avg generation time | Avg inference time | Avg transfer time |
---|---|---|---|---|
This PR | 16.86 | 59.31 ms | 50.37 ms | 8.58 ms |
0.7.0 | 15.17 | 65.93 ms | 52.07 ms | 13.45 ms |
nTokens = 90, buffer = Q80
2 x AMD EPYC 7402P 24-Core Processor
Version | Avg tokens / second | Avg generation time | Avg inference time | Avg transfer time |
---|---|---|---|---|
This PR | 13.04 | 76.67 ms | 45.33 ms | 30.93 ms |
0.7.0 | 12.79 | 78.21 ms | 46.30 ms | 31.49 ms |
0.6.0 | 12.55 | 79.71 ms | 47.08 ms | 32.22 ms |
Transfer / token
Model: dllama_meta-llama-3-8b_q40.bin Buffer: Q80
🤯 🤯