-
### Your current environment
CI environment
### 🐛 Describe the bug
See https://github.com/vllm-project/vllm/pull/5286 and https://github.com/vllm-project/vllm/issues/5152
My guess is the way we …
-
WIP project roadmap for LoRAX. We'll continue to update this over time.
# v0.10
- [ ] Speculative decoding adapters
- [ ] AQLM
# v0.11
- [ ] Prefix caching
- [ ] BERT support
- [ ] Embe…
-
- [ ] [At the Intersection of LLMs and Kernels - Research Roundup](https://charlesfrye.github.io/programming/2023/11/10/llms-systems.html)
# At the Intersection of LLMs and Kernels - Research Roundup…
-
### Your current environment
root@9c92d584ab5f:/app# python3 ./collect_env.py
Collecting environment information...
WARNING 05-15 15:13:52 ray_utils.py:46] Failed to import Ray with ModuleNotFound…
-
As we plan to move some states from the `BatchConfig` to the `RequestManager`, some fields in `BatchConfig` are rendered redundant. The following are the data members of the current `BatchConfig`.
``…
-
Many-Shot In-Context Learning
https://arxiv.org/abs/2404.11018
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
https://arxiv.org/abs/2404.14219
https://github.com/apple…
-
Great work!
I was wondering whether the distilled version might still be compatible with CTranslate2 / faster-whisper? I understand the changes to the decoder might require some changes there, not …
-
Hi all, this issue will track the feature requests you've made to TensorRT-LLM & provide a place to see what TRT-LLM is currently working on.
Last update: `Jan 14th, 2024`
🚀 = in development
#…
-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.…
-
An obvious feature to me, but also not one that is simple to implement - is speculative sampling on the road map?
The idea would be using a second tiny-model combined with e.g. for greedy validatio…