-
### System Info
CPU x86_64
GPU NVIDIA L40
TensorRT branch: v0.10.0
CUDA: NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.4
### Who can help?
@kaiyux
…
-
### System Info
A100-PCIe-40GB
Tensorrt-LLM-verison:0.11.0
### Who can help?
@Tracin
### Information
- [ ] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An offici…
-
### System Info
NVIDIA-SMI 535.154.05
Driver Version: 535.154.05
CUDA Version: 12.4
- GPU properties
- GPU name: NVIDIA L20
- GPU memory size: 46068MiB
- Libraries
- Te…
-
### System Info
I met a trtllm-build issue.
GPU: RTX 3090
I followed official script of the below steps.
1. I ran the below code after installing nvidia container toolkit.
```
docker run -…
-
Hi, thanks for your great job for LLM decoding process. I tested the code and got the expected decoding speedup for llama2-7B, but it seems that the end2end time cost does not change too much? (61s ->…
-
### System Info
- Host: VMware ESXi 7
- Host Nvidia drivers: 550.54.16
- VM CPU architecture: x86_64
- VM Nvidia drivers: 550.54.15
- VM OS: Ubuntu LTS 22.04
- Physical GPU: A100
- TensorRT-LLM…
-
### System Info
EC2 instance: G5.48xl
Nvidia driver: 535.161.08
Cuda: 12.2
commit 5d8ca2faf74c494f220c8f71130340b513eea9a9
Torch: 2.3.0
### Who can help?
@byshiue running into the issue with h…
-
I'm reading the manual here: https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md
The scripts are so simple, do they ensure best performance?
```
python convert_checkpoint.py …
-
I'm trying storywriting with KoboldCpp. At some point the story will get longer than the context and KoboldCpp starts evicting tokens from the beginning, with the (newer) ContextShift feature. Sometim…
-
### System Info
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-…