-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.…
-
### Your current environment
```text
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.10 (x86_64)
GCC version: (…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
### System Info
GPU A30 * 2
TensorRT-LLM version: v0.9.0
Model: vicuna 13B
### Who can help?
@byshiue
### Information
- [X] The official example scripts
- [ ] My own modified scripts
#…
-
I am starting this issue to do a more thorough benchmarking than the [notebooks](/notebooks) used in the repo.
What should we measure:
1. Time for generation
2. Max GPU VRAM
3. Accuracy
Hardw…
-
**Problem**
I need to create a lot of small JSONs with a LLM. To do so I started with [Jsonformer](https://github.com/1rgs/jsonformer). However, since this is not maintained anymore and my colleagu…
-
# `generate` 🤜 🤛 `torch.compile`
This issue is a tracker of the compatibility between `.generate` and `torch.compile` ([intro docs by pytorch](https://pytorch.org/tutorials/intermediate/torch_comp…
gante updated
4 weeks ago
-
### Your current environment
```text
The output of `python collect_env.py`
```
Collecting environment information...
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorc…
-
### 🚀 The feature, motivation and pitch
Recently, we read a paper where the vLLM team proposed a method called **SmartSpec**.
We believe that the research, which dynamically adjusts the speculation …
-
does it support batchsize > 1 ?