-
version: TensorRT-LLM 0.10.0
the official script(TensorRT-LLM/examples/multimodal/run.py) use same prompt repeat to form a batch. but if I use different prompts to form a batch, the result is incorre…
-
## Description
After migrating my backend to TensorRT 10, I've noticed that some models are slower with TensorRT-10.
Looks like the issue comes from the mapping on some InstanceNormalization…
-
### System Info
-GPU A800*8
Nvlink
### Who can help?
_No response_
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An officially supported task…
-
环境:
![image](https://github.com/peakhell/OCRIntegrator/assets/86536994/91f3b191-1c2c-4051-84f4-1abbf9d40f34)
```
(ocr) ➜ OCRIntegrator git:(main) ✗ pip list | grep tensor
nvidia-tensorrt …
-
### System Info
x86_64, NVIDIA A100 80GB, TensorRT-LLM v0.10.0
### Who can help?
@ncomly-nvidia
### Information
- [ ] The official example scripts
- [X] My own modified scripts
### Tasks
- […
-
I'm testing [kv reuse feature](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/kv_cache_reuse.md)
Everything works fine until i try to use [offloading to host mem](https://github.com/N…
-
### System Info
x86_64
755G
nvidia T4
ubuntu 22.04
trtllm version : https://github.com/NVIDIA/TensorRT-LLM/archive/9691e12bce7ae1c126c435a049eb516eb119486c.zip
pip install tensorrt-llm==0.11…
-
### System Info
CPU Architecture: x86_64
CPU/Host memory size: 1024Gi (1.0Ti)
GPU properties:
GPU name: NVIDIA GeForce RTX 4090
GPU mem size: 24Gb…
-
Hi all, this issue will track the feature requests you've made to TensorRT-LLM & provide a place to see what TRT-LLM is currently working on.
Last update: `Jan 14th, 2024`
🚀 = in development
#…
-
When I run the Usage demo
```
import torch
from torch2trt import torch2trt
from torchvision.models.alexnet import alexnet
# create some regular pytorch model...
model = alexnet(pretrained=True…