parallel-requests Search Results

1000+ results
for parallel-requests

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

EleutherAI/lm-evaluation-harness #2424

test speculative decode accuracy

I use lm-evaluation-harness to test vllm accuracy 1.when don't enable spec decode,I got some result below vllm command: CUDA_VISIBLE_DEVICES=0 python -m vllm.entrypoints.openai.api_server --model /…

baoqianmagik updated 2 weeks ago
5
kudobuilder/kuttl #511

Increase client-side QPS and Burst

**What would you like to be added**: Increase client-side QPS and Burst **Why is this needed**: We hit throttling limit event with parallel: 1 option and it make the test run slower

nguyenvancaonguyen updated 2 months ago
1
unbork/hey #5

Continuously process results

Currently the code allocates a single channel with [a fixed 1000000 results](https://github.com/unbork/hey/blob/7f27e71cf07e53b5911a38b94ae8923038b5d711/requester/requester.go#L33). This limits the us…

maxmoehl updated 2 months ago
1
Consensys/teku #8609

[FEAT]: Simultaneous Use of Multiple Beacon Nodes

Description: I propose adding the ability to specify multiple beacon node endpoints (comma-separated) for validator clients. Instead of waiting for a failover when one beacon node fails, the validato…

olisovyi-everstake updated 3 weeks ago
2
huggingface/optimum-neuron #721

training loss while fine-tuning llama 3.1 with lora is very …

### System Info ```shell using Huggingface AMI from AWS marketplace with Ubuntu 22.04 optimum-neuron 0.0.25 transformers 4.45.2 peft 0.13.0 trl 0.11.4 accelerate 0.29.2 torch 2.1.2 ``` …

anilozlu updated 1 week ago
3
replicate/cog #1335

"Already running a prediction" When hitting multiple request…

This is not exactly issue, So the situation is, I am running COG container locally and I want to process multiple requests at once, however when i hit 100 requests at once it gave me output for 20 …

isahillohiya03 updated 1 month ago
8
kubernetes/kubernetes #121137

Can we add Image digest check while kubelet pull image by "P…

### What would you like to be added? Image digest check might needed while kubelet pull image by "Parallel" mode. Same Image digest pull request no need to queue up and wait meaninglessly. Cod…

mr-002 updated 2 weeks ago
10
warmcat/libwebsockets #2847

Any way to reuse 2 HTTP keep-alive connections to the same s…

If I specify LCCSCF_PIPELINE then my connection requests get serialized onto the same socket so there is reuse but only one GET at a time. If I don't specify LCCSCF_PIPELINE, then LWS closes the conn…

wreuven updated 1 week ago
4
vllm-project/vllm #7567

[Usage]: Extremely slow inference with Llama 3.1 70b Instruc…

### Your current environment ```text Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

harsh244 updated 1 week ago
15
vllm-project/vllm #1121

AssertionError: Prompt input should have only one seq.

- OS: **Ubuntu 22.04** - GPUs: **2x 4090** (2x 24GB) - CUDA: **11.8** - CPU: **Ryzen 3800X** - RAM: **64GB** - vLLM build: **main** `400b8289` Started the API server with this command: ```sh …

viktor-ferenczi updated 1 week ago
8

上一页 1...27 28 29 30 31 32 33...100 下一页

1000+ results for parallel-requests

1000+ results
for parallel-requests