-
### OpenVINO Version
2024.4.0-16579-c3152d32c9c-releases/2024/4
### Operating System
Other (Please specify in description)
### Device used for inference
NPU
### Framework
None
### Model used
…
-
We need to check that all the changes in the external forks of MLCommons repos used in our past SCC'22 and 23 tutorials are merged with MLCommons mainline repositories.
We can then update these tu…
-
I have a few questions about the inference efficiency of deepseek v2
1.
> In order to efficiently deploy DeepSeek-V2 for service, we first convert its parameters into the precision of FP8.
Ar…
-
Cool model! I'll have a try.
I'd like to know 5 token/s minimal hardware requirement.
-
I hope that everyone enter this issue can share your system, CPU, GPU, inference speed in GPT stage, and the version you use(better to compare v2 as a standard).
so that we could see how different me…
-
Some edge systems may not be connected to internet and we need a way to run mlperf inference benchmarks on them using CM.
-
### Your current environment
The offline inference of Llama-3-8B with benchmark_latency.py sweeping on 1, 2, 4 cards results:
And the optimum-habana results:
The results show that on 1 card…
-
Tracker issue for the educational project
-
Check https://docs.mlcommons.org/inference/benchmarks but found:
@ashwin @jdduke @codyaustun @badenh @koichishirahata
-
From https://github.com/pytorch/pytorch/pull/134282#issuecomment-2307157197, in the aarch64 dashboard results, if we benchmark with fp16, it is 2x~10x slower than bf16, often causing timeout in cases.…