ggerganov / llama.cpp

LLM inference in C/C++
MIT License
61.92k stars 8.89k forks source link

[Question] GPU vs Metal performance & Seeding models #4384

Closed aramcheck closed 3 months ago

aramcheck commented 7 months ago

Hello,

This is not an issue, hopefully it is ok to ask here.

I ran some tests using llama.cpp on Apple Silicon (Macbook Air M1) and NVIDIA Quadro M4000. Using the same model orca-2-7b.Q4_0.gguf I got much better performance on the M1. Concretely:

llama on M1

llama_print_timings:        load time =    1035.74 ms
llama_print_timings:      sample time =     166.85 ms /   216 runs   (    0.77 ms per token,  1294.54 tokens per second)
llama_print_timings: prompt eval time =     808.49 ms /    84 tokens (    9.62 ms per token,   103.90 tokens per second)
llama_print_timings:        eval time =   15542.42 ms /   215 runs   (   72.29 ms per token,    13.83 tokens per second)
llama_print_timings:       total time =   16641.54 ms

llama with GPU

llama_print_timings:        load time =     725.88 ms
llama_print_timings:      sample time =     135.49 ms /   244 runs   (    0.56 ms per token,  1800.86 tokens per second)
llama_print_timings: prompt eval time =    2047.53 ms /    84 tokens (   24.38 ms per token,    41.02 tokens per second)
llama_print_timings:        eval time =   52313.11 ms /   243 runs   (  215.28 ms per token,     4.65 tokens per second)
llama_print_timings:       total time =   54576.09 ms

The other aspect that caught my attention is that seeding both models the same seed, yield to different results in different architectures. Although the completions look somehow very similar in structure.

I don't have much understanding about how the seed is implemented, but I wanted to ask if both observations are expected based on your experience?

This is the prompt I used:

<|im_start|>system
You are Orca, an AI language model created by Microsoft. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.<|im_end|>
<|im_start|>user
What are the main challenges in higher Education once AGI (Artificial General Intelligence) is achieved?<|im_end|>
<|im_start|>assistant

And both responses:

Metal There are different possible scenarios for how AGI could affect higher education, depending on how it is developed, implemented, and regulated. Some of the main challenges that could arise are:

GPU There are different possible scenarios for how AGI might affect higher education, depending on how it is developed, deployed, and regulated. However, some of the main challenges that could be encountered are:

github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.