-
### News
- Conferences
- AAAI 2023: Washington DC (2. 7 - 14)
- [Google Cloud가 Anthropic과 손을 잡고 MS + OpenAI 조합에 대항?](https://www.googlecloudpresscorner.com/2023-02-03-Anthropic-Forges-Partnership…
-
Right now, all media loading is done in parallel, which isn't ideal and can result in unnecessary dropped frames (observed by @aubilenon).
In an ideal world:
- high priority: media frames that wil…
egnor updated
2 years ago
-
### Your current environment
The output of `python collect_env.py`
```Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A
OS: Ubuntu …
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
### Your current environment
```text
(vllm) nd600@PC-7C610BFD7B:~$ python collect_env.py
Collecting environment information...
/home/nd600/miniconda3/envs/vllm/lib/python3.10/site-packages/torch…
-
```python
Running loglikelihood requests: 0%| | 0/18330 [00:00
-
### Proposal to improve performance
Recently, vLLM has been conducting a lot of work related to Speculative Decoding, and we often see remarkable achievements.
For the Speculative Decoding algorit…
-
For example, 1.1B tinyllama.
-
- I change the [batch size](https://github.com/flexflow/FlexFlow/blob/inference/inference/models/opt.cc#L71) to 2 .
- Then I use the below command to execute the opt -6.7b
`../build/inference/spec…