-
For us to evaluate sparsification results, we need to evaluate the performance of each inference step: relevance and kpi-extraction.
- As a part of this issue, create a notebook called benchmarks…
-
Hi SGLang team,
I have just tried SGLang for the first time - and it was probably one of the easiest projects to setup and launch - it literally took me a few minutes to go from 0 to serving - awes…
-
Setting up the launcher to `process` on an MPS device causes the benchmark to be stuck permanently on “Sending report to main process.”
Steps to reproduce:
Run this configuration on an MPS device …
-
## Problem Description
When trying to use pipeline parallelism in tensorrt-llm on 2+ NVIDIA GPUs, I encounter ```AssertionError: Expected but not provided tensors:{'transformer.vocab_embedding.weig…
-
### 🐛 Describe the bug
torchbench_amp_bf16_inference
- [ ] `moco`
Traceback (most recent call last):
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", …
-
http://127.0.0.1:8000/krai_qaic_task/benchmark/QuickBenchmarking
-
http://127.0.0.1:8000/tmp/benchmark/QuickBenchmarking/
QAIC
-
### OpenVINO Version
2024.3.0
### Operating System
Windows System
### Device used for inference
GPU
### Framework
None
### Model used
Mask R-CNN
### Issue description
…
-
**What questions are you trying to answer? Please describe.**
Analyze latency/throughput for NVTabular+TensorFlow for Triton Inference Server
-
From https://github.com/pytorch/pytorch/pull/133065#issuecomment-2288701447 . Basically, there was a noticeable performance drop on the inference side after bumping up the HF pin, [dashboard](https://…