-
What is the time taken for processing a 3/5/10s video? on a A100/H100 GPU.
Would be great to know any of your existing benchmarks!
-
I tried to replicate the results from your benchmarks using docker gpu with this docker image nvidia/cuda:12.6.1-cudnn-devel-ubuntu22.04 and nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04. After I instal…
-
We are doing benchmarking of triton with different backends, but unable to get the metric the calculate the latency of each request (lets assume each request has batch size of `b`)
1. Is request la…
-
just wanted to point this out. Occasionally you get the following error when using inference benchmark :
```
2024-09-17 07:56 INFO User selected random dataset. Generating prompt and output l…
-
hi, how can i apply a textonly benchmark in this inference framework
-
### Community Note
* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the…
-
https://docs.mlcommons.org/inference/benchmarks/text_to_image/reproducibility/scc24
-
ERROR: [Torch-TensorRT] - Unsupported operator: aten::to.dtype_layout(Tensor(a) self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, bool non_blocking=Fals…
-
Hey, If you are open for this feature, i can add onnx inference benchmark with cuda execution provider.
-
hello,I have setup a inference platform with more than 100 GPUS which can provide inference service for prevalent llm, I want to join this benchmark ,so how can I do it?