ggerganov / llama.cpp

LLM inference in C/C++
MIT License
68.31k stars 9.8k forks source link

Research: Design of llama-bench #10386

Open jumbo-q opened 1 week ago

jumbo-q commented 1 week ago

Research Stage

Previous existing literature and research

Dear How the bench designed to test the efficiency of performance whats the batch mean in the bench and what is tested

Hypothesis

No response

Implementation

No response

Analysis

No response

Relevant log output

No response

JohannesGaessler commented 1 week ago

https://github.com/ggerganov/llama.cpp/tree/master/examples/llama-bench

jumbo-q commented 1 week ago

Thanks u Ive seen it before theres two Questions:

  1. Is the content of this batch input self-defined, similar to Some other Infer framework or is there a specific dataset for it? Or other operations?
  2. The output time only provides the average and variance for each token. How is this time calculated? Is it the mean and variance over multiple runs? Also, what part of the execution is being timed? From which point to which point is the timing measured?