Hello, I noticed that the benchmarking results are often presented using inconsistent metrics, making it hard to compare. For example, the BLAS library is presented in kernel execution time, but the solve library is presented in latency. What is the reason behind using latency as a metric rather than kernel execution time? Does that mean the computation is pipelined and streaming rather than a single execution?
Hello, I noticed that the benchmarking results are often presented using inconsistent metrics, making it hard to compare. For example, the BLAS library is presented in kernel execution time, but the solve library is presented in latency. What is the reason behind using latency as a metric rather than kernel execution time? Does that mean the computation is pipelined and streaming rather than a single execution?