speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

jungwoo-ha/WeeklyArxivTalk #70

[20230205] Weekly AI ArXiv 만담 시즌2 - 4회차

### News - Conferences - AAAI 2023: Washington DC (2. 7 - 14) - [Google Cloud가 Anthropic과 손을 잡고 MS + OpenAI 조합에 대항?](https://www.googlecloudpresscorner.com/2023-02-03-Anthropic-Forges-Partnership…

jungwoo-ha updated 1 year ago
4
egnor/pivid #10

Prioritize media loading

Right now, all media loading is done in parallel, which isn't ideal and can result in unnecessary dropped frames (observed by @aubilenon). In an ideal world: - high priority: media frames that wil…

egnor updated 2 years ago
1
vllm-project/vllm #7580

[Bug]: run quantized model error

### Your current environment The output of `python collect_env.py` ```Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1…

soulzzz updated 2 weeks ago
3
vllm-project/vllm #7033

[Usage]: I updated VLLM to the latest one and I discover th…

### Your current environment ```text Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: Ubuntu …

yitianlian updated 1 month ago
10
vllm-project/vllm #6682

[Bug]: CUDA OOM error when loading another model after exiti…

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

R-C101 updated 1 month ago
4
vllm-project/vllm #6147

[Bug]: CUDA error when using multiple GPUs

### Your current environment ```text (vllm) nd600@PC-7C610BFD7B:~$ python collect_env.py Collecting environment information... /home/nd600/miniconda3/envs/vllm/lib/python3.10/site-packages/torch…

ndao600 updated 2 months ago
1
EleutherAI/lm-evaluation-harness #2177

when executes the OPT 6.7B model evaluation, the problem Typ…

```python Running loglikelihood requests: 0%| | 0/18330 [00:00

yanchenmochen updated 4 weeks ago
10
vllm-project/vllm #6468

[Performance]: [Speculative Decoding] Measurement of Cost Co…

### Proposal to improve performance Recently, vLLM has been conducting a lot of work related to Speculative Decoding, and we often see remarkable achievements. For the Speculative Decoding algorit…

bong-furiosa updated 1 month ago
5
SafeAILab/EAGLE #116

Will smaller models be supported?

For example, 1.1B tinyllama.

Puvoka updated 3 weeks ago
1
flexflow/FlexFlow #846

error when I use different batch size for opt inference

- I change the [batch size](https://github.com/flexflow/FlexFlow/blob/inference/inference/models/opt.cc#L71) to 2 . - Then I use the below command to execute the opt -6.7b `../build/inference/spec…

lambda7xx updated 1 year ago
2

上一页 1...17 18 19 20 21 22 23...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding