-
I use lm-evaluation-harness to test vllm accuracy
1.when don't enable spec decode,I got some result below
vllm command:
CUDA_VISIBLE_DEVICES=0 python -m vllm.entrypoints.openai.api_server --model /…
-
**What would you like to be added**:
Increase client-side QPS and Burst
**Why is this needed**:
We hit throttling limit event with parallel: 1 option and it make the test run slower
-
Currently the code allocates a single channel with [a fixed 1000000 results](https://github.com/unbork/hey/blob/7f27e71cf07e53b5911a38b94ae8923038b5d711/requester/requester.go#L33). This limits the us…
-
Description:
I propose adding the ability to specify multiple beacon node endpoints (comma-separated) for validator clients. Instead of waiting for a failover when one beacon node fails, the validato…
-
### System Info
```shell
using Huggingface AMI from AWS marketplace with Ubuntu 22.04
optimum-neuron 0.0.25
transformers 4.45.2
peft 0.13.0
trl 0.11.4
accelerate 0.29.2
torch 2.1.2
```
…
-
This is not exactly issue, So the situation is,
I am running COG container locally and I want to process multiple requests at once, however when i hit 100 requests at once it gave me output for 20 …
-
### What would you like to be added?
Image digest check might needed while kubelet pull image by "Parallel" mode.
Same Image digest pull request no need to queue up and wait meaninglessly.
Cod…
-
If I specify LCCSCF_PIPELINE then my connection requests get serialized onto the same socket so there is reuse but only one GET at a time.
If I don't specify LCCSCF_PIPELINE, then LWS closes the conn…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
- OS: **Ubuntu 22.04**
- GPUs: **2x 4090** (2x 24GB)
- CUDA: **11.8**
- CPU: **Ryzen 3800X**
- RAM: **64GB**
- vLLM build: **main** `400b8289`
Started the API server with this command:
```sh
…