-
A number of Preview systems in MLPerf Inference v4.0 used fewer cards than would be typical in production due to a limited availability of cards at the time. Rather than benchmarking the systems with …
-
We're exploring various optimizations available in the [Diffusers library](https://huggingface.co/docs/diffusers/main/en/optimization/opt_overview) to enhance VRAM usage and inference speed. @titan-no…
-
Traceback (most recent call last):
File ".//src/benchmark_evaluation/bbq_eval.py", line 28, in
from decoding_algorithm import Inference
File "/sea-llm/src/decoding_algorithm/__init__.py", …
-
Hi @JUGGHM, Thank you all for your great work with MMDE.
I would like to know if you have some inference time or speed data available on any GPUs or some benchmarks related to that.
Best Regard…
-
Hi, when running example inference on Mamba2:
```
python benchmarks/benchmark_generation_mamba_simple.py --model-name "state-spaces/mamba2-2.7b" --prompt "My cat wrote all this CUDA code for a new …
-
### Is there an existing issue for this bug?
- [X] I have searched the existing issues
### 🐛 Describe the bug
Got `TypeError: LlamaInferenceForwards.llama_causal_lm_forward() got an unexpected keyw…
-
### 🐛 Describe the bug
We are planning upgrading our python environment from 3.8 to 3.10, because pytorch has deprecated python 3.8 recently.
But we found that there are performance gaps between pyt…
-
![1725962698758](https://github.com/user-attachments/assets/d18ac98f-7fe9-430c-9377-02529d957823)
-
### Problem Description
Seeing GPU fault when running the onnxruntime-inference-examples script using reduced layer bert models during benchmarking.
It appears quantization/calibration steps work …
-
Current list of tasks:
- [x] threads > 1 do not work
- [x] batches > 1 do not work
- [x] check object detection task on any model to test TVM integration
- [x] detect TVM version via CK package …