-
### System Info
- GPU Name: NVIDIA GeForce RTX 3080 Ti
- System Ram: 65GB
- TensorRT-LLM branch `rel`
### Who can help?
@Tracin
@byshiue
### Information
- [ ] The official example scripts
- [X…
-
Hi, I'm running on aws a10g and I'm trying to perform some benchmarking of different setups.
I tried to shard the model to 2 gpus to make it faster but I'm getting the same latency.
Does this make…
-
### System Info
CPU Architecture: x86_64
CPU/Host memory size: 1024Gi (1.0Ti)
GPU properties:
GPU name: NVIDIA GeForce RTX 4090
GPU mem size: 24Gb…
-
```dockerfile
#Base Image
FROM nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3
USER root
RUN apt update && apt install --no-install-recommends rapidjson-dev python-is-python3 git-lfs curl uuid…
-
Hi again,
I've successfully quantized an onnx model to int8, then converted to tensorrt engine and noticed the performance increase compared to fp16.
```bash
python -m modelopt.onnx.quantizati…
-
## Description
I tried to convert model to int8, but it fails with the error blow
[E] Error[10]: Error Code: 10: Could not find any implementation for node {ForeignNode[/transformer/attention_…
-
### System Info
- CPU architecture: x86_64
- CPU memory: 110GB
- GPU properties:
- GPU Name: NVIDIA A100 80GB PCIe
- Libraries:
- tensorrt-llm==0.11.0.dev2024060400
- CUDA Ver…
-
### System Info
- CPU architecture: x86_64
- CPU memory size: 128G
- GPU name: NVIDIA GeForce GTX 1660S
- GPU memory size: 6G
- TensorRT-LLM branch: main
- TensorRT-LLM commit: 9691e12
- Contai…
gyr66 updated
3 weeks ago
-
Hi, I have already installed mmdeploy from **git clone git@github.com:drilistbox/mmdeploy.git**, but there is an error when I used the command: python tools/convert_bevdet_to_TRT.py $config $checkpoin…
-
Opening a new issue as #237 was closed prematurely.
It seems that engines built using the `--paged_kv_cache` flag leak GPU memory. Below is a minimal reproducible example code that can be used to …