Open invisiblepancake opened 1 month ago
(For "history": bench with llamafile V0.8.6)
Some result with my zen3+128Go of RAM / Linux (fc40) RAM: DDR4@3600
./llamafile-bench-0.8.6 -p "256,512,1024" -m "Mistral-7b-instruct-v0.2.Q6_K.llamafile,Mistral-7b-instruct-v0.2.Q8_0.llamafile,Mistral-7b-instruct-v0.2.F16.llamafile,Mistral-7b-instruct-v0.2.BF16.llamafile,Mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile,mixtral-8x7b-instruct-v0.1.Q6_K.llamafile,Mixtral-8x7b-instruct-v0.1.BF16.llamafile,Mixtral-8x22B-Instruct-v0.1.Q5_K_M.llamafile,Mixtral-8x22B-Instruct-v0.1.Q6_K.llamafile"
As you see matmul is memory limited on this CPU (DDR-4 + zne3)
Originally posted by @Djip007 in https://github.com/Mozilla-Ocho/llamafile/discussions/450
(For "history": bench with llamafile V0.8.6)
Some result with my zen3+128Go of RAM / Linux (fc40)
RAM: DDR4@3600
As you see matmul is memory limited on this CPU (DDR-4 + zne3)
Originally posted by @Djip007 in https://github.com/Mozilla-Ocho/llamafile/discussions/450