-
Using the ask-tell interface, we can in principle execute multiple configurations in parallel. Do you have an intuition if there are "safe" ways of doing this (e.g. ask for all configurations of a bra…
-
### Motivation
Response metrics are very useful for benchmarking performance of different configurations. LMDeploy could implement similar metrics to [vLLM's `RequestMetrics`](https://github.com/vl…
-
#14 notes some performance issues with re-rendering `Label`s (since word wrapping is a nontrivial CPU operation).
It would be useful to have some benchmarks that could call out when a chance has a …
-
**Is your feature request related to a problem? Please describe.**
Currently benchmarks are run in CI in limited configurations mainly as a test that the benchmarks work and some specific scenarios…
-
### 📚 The doc issue
**Context:** During July 9, 2024, vLLM open office hours (FP8), there were several questions regarding how to **optimize** model deployment inference configurations targeting the …
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
Hi,
First of All I appreciate your effort, work and support to opensource community.
I am just trying out Xmrig with RandomX protocol, your benchmarking scores are way too high than I could get u…
-
With the latest machines coming into our benchmarking infrastructure being hybrid processors with efficiency and performance cores, aka big.LITTLE architectures, it becomes more desirable to configure…
smarr updated
2 weeks ago
-
Hi all, it's time for us to considering the official release of BitBLAS v0.0.1, here are some todo items before this release:
- [ ] Finalize comprehensive test cases and benchmarking scripts.
- [x…
-
Currently, the setup uses fake webserver as the metrics emitter which only [supports standard metrics](https://github.com/prometheus/test-infra/blob/master/tools/fake-webserver/main.go#L61-L71) (e.g. …