-
### mpv Information
_No response_
### Important Information
```text
All versions of MPV
```
### Reproduction Steps
Simply changing the FPS of media (and adding pitch) manually is mo…
-
### 🚀 The feature, motivation and pitch
Enables zero-overhead structured generation in LLM inference.
https://github.com/mlc-ai/xgrammar
### Alternatives
_No response_
### Additional context
h…
-
### 🚀 The feature, motivation and pitch
It will be great if we have an API to support evicting all KV cache from GPU memory.
By mentioning `sleep mode`, I mean, if there are some technical consi…
-
### 🚀 The feature, motivation and pitch
I tried loading finetuned LoRA for llama 3.2 11B vision instruct using the serve command (OpenAI client) and get this error message.
```shell
ERROR 12-0…
-
### 🚀 The feature, motivation and pitch
```
warnings.warn(
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, …
-
### What Happened
As we improve the blocks, more settings are being appended to them. These are usually left with the defaults and intended for advanced usage. For example:
![Captura desde 2024-10-2…
manuq updated
1 month ago
-
I have gotten most of the stuff done on this. Right now there is a replayer engine that aims to closely mimic GAX's functionality, and an editor/tracker for the .gax files. These are the current stand…
-
### 🚀 The feature, motivation and pitch
I'm working on evaluating Llama3.1-70B on the [MMLU](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/mmlu/README.md) and [MMLU-Pro]…
-
### 🚀 The feature, motivation and pitch
For LLM inference, requests per second(QPS) is not constant. It needs launch vllm engine on demand. For elastic instance, it's significance to reduce TTFT(Time…
-
### 🚀 The feature, motivation and pitch
## Motivation
If an engine is currently handling a single long sequence in the prefill stage any other incoming sequence has to wait untill the LLM is done …