-
I tried running Nemo-12b 4-bit model on one T4 GPU, but the inference speed is very slow. Additionally, the 'forward' function takes much longer than 'generate'.
Is there a speedup benchmark for the…
-
## ❓ Questions and Help
Does maskrcnn benchmark support half precision inference? If not, what should I add?
-
### Description
The PR [#39446](https://github.com/ray-project/ray/pull/39446) disables preloading Jemalloc for workers totally. However, Jemalloc is still useful in some cases, and we could make it …
-
### System Info
Hi Team,
First of all huge thanks for all the great work you are doing.
Recently, I was benchmarking inference for T5 model on AWS EC2 ( G6E machine with L40 GPU) for batch sizes…
-
### Motivation
This is an interesting blog post [FireAttention V2: 12x faster to make Long Contexts practical for Online Inference](https://fireworks.ai/blog/fireattention-v2-long-context-inference…
-
### Is there an existing issue for this bug?
- [X] I have searched the existing issues
### 🐛 Describe the bug
I failed to run ChatGLM model with ColossalAI 0.3.6.
backtrace is here
----…
-
### Proposal to improve performance
_No response_
### Report of performance regression
_No response_
### Misc discussion on performance
To reproduce vLLM's performance benchmark, please…
-
**What would you like to be added/modified**:
A benchmark suite for large language models deployed at the edge using KubeEdge-Ianvs:
1. Interface Design and Usage Guidelines Document;
2. Implem…
-
### System Info
tgi-gaudi docker container built from master branch (4fe871ffaaa62f1a203607078e868fcca962b017)
Ubuntu 22.04.3 LTS
Gaudi2
HL-SMI Version: hl-1.15.0-fw-48.2.1.1
Driver Version: 1…
-
Hello everyone.
I have been using MLperf benchmarks for some time. And I have a small list of questions about them. I am asking them here because I have not found answers in other sources of informat…