-
## 🐛 Bug
When attempting to test speculative decoding using the Speculative decoding predefined test, I get a huge memory usage which results in an OOM on my device
## To Reproduce
Steps to r…
-
Hello!
Does TensorRT-LLM supports Medusa with Mixtral 8x7B?
My understanding is that right now the Medusa [convert_checkpoint.py](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/medusa/c…
-
### Your current environment
```text
The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
### Before submitting a bug report
- [X] I updated to the latest version of Multi-Account Container and tested if I can reproduce the issue
- [X] I searched for existing reports to see if it hasn't a…
-
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A…
-
### Prerequisite
- [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the ex…
-
### Context
End Of Sequence tokens are an essential part of LLM training and inference. You can find more details in [this comment](https://discuss.huggingface.co/t/how-does-gpt-decide-to-stop-gene…
-
### Your current environment
```
PyTorch version: 2.1.2+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Ubuntu 16.04.7 LTS (x86_64)
GCC versio…
-
Please add support for this model. https://github.com/vikhyat/moondream
An extra idea which may be feasible or unfeasible (I do not know) is maybe speculative decoding using a smaller model like th…