Closed dan-and closed 4 weeks ago
Benchmark with a Nvidia CMP 30HX (6GB Ram, 1660 Super equivalent)
`
Benchmark with a Nvidia 1060 (3GB Ram, Low cost version of the GTX 1060)
Average of eval rate: 2.868 tokens/s ----------------------------------------`
Raspberry Pi 4 - 2GB SBC with Debian
` llm_benchmark run --no-sendinfo -------Linux----------
`
Raspberry Pi 4 - 4GB SBC with Debian
`# llm_benchmark run --no-sendinfo -------Linux----------
Average of eval rate: 1.922 tokens/s `
Raspberry Pi 5 - 4 GB
`$ llm_benchmark run --no-sendinfo -------Linux----------
Average of eval rate: 5.512 tokens/s ----------------------------------------`
Integration is done. v0.3.20 on pypi https://pypi.org/project/llm-benchmark/
Thank you very much:-)
@dan-and qwen2 is the latest one. In your testing environment, which one do you recommend? I am going to retire qwen, and put qwen2 For phi from Microsoft, I think phi3:3.8b is good enough with the latest version. I am thinking to retire phi:2.7b. Any opinion is appreciated. Thank you.
Hey Jason,
I'm totally fine with retiring old models. However, please keep in mind, that memory restricted systems should be able to continue to run.
benchmark_models_2gb_ram.yml: Substitude for qwen:1.8b would be qwen2:1.5b, which utilizes just 1.3gb of RAM
benchmark_models_3gb_ram.yml: phi:2.7b is a sweet spot memory wise at 2.4gb, so it fits nicely in old GPU cards like the 1060 3gb. Also, a great option for 4gb SBCs, which gives enough headroom to prevent swapping ram.
phi3:3.8b on the other hand utilizes 3.4gb, which doesn't fit anymore.
I totally understand that you try to retire old models, but I have no good substitute for a sub 3gb model based on phi3.
Just my 2 cents
OK...Follow your suggestion. Here is the integration result on pypi. 0.3.21 https://pypi.org/project/llm-benchmark/#history
Thanks for this straightforward benchmarking tool for ollama.
As small systems with low amount of memory, but added NPUs are on the horizon, slightly adjusted the scripts to allow either systems with 4gb or lower, as well as a super low-mem variant of just 2gb ram.
This also helps to measure the smallest GPUs performance measurement without getting penalties due to cpu/gpu workload splits.
I will add a few comments with results of various machine combinations, include Raspberry PIs.
I never uploaded them, as additional model variants were added and I don't know how your backend will react.