-
https://github.com/ollama/ollama
https://github.com/abetlen/llama-cpp-python
https://github.com/vllm-project/vllm
-
Expected release date: Mar 15th, 2024
# General
1. [x] Support general page table layout (@yzh119 )
2. [ ] sm70/75 compatibility (@yzh119 )
3. [ ] performance: using fp16 as intermediate data ty…
-
target task: summarization
distillation: teacher → student (draft model)
t5-xl: target, t5-small: drafter
n-gram: ...?
Ngram: KD
ngram should be trained with the model generated dataset.
…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A
OS: Ubuntu …
-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md)…
-
https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md
if possible add speculvative decoding example in llama docs.
-
Hi, @WoosukKwon and @zhuohan123 ,
Fantastic project!
I was taking a stab at implementing a version of **greedy** lookahead-decoding. Given some candidate completions, I was trying to:
1. Fork …
-
### ⚠️ Please check that this feature request hasn't been suggested before.
- [X] I searched previous [Ideas in Discussions](https://github.com/OpenAccess-AI-Collective/axolotl/discussions/categories…
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
I install the llamaindex with the command `pip install llama-index` and install t…
-
As described. The speculative decoding implementation is working, but should be sped up.