-
### System Info
- CPU architecture (x86_64)
- GPU name (NVIDIA A100)
- TensorRT-LLM(version: 0.8.0.dev20240130000.8.0.dev2024013000)
### Who can help?
_No response_
### Information
- [X] The of…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
感谢您的优秀工作!
最近我在尝试在Jetson Orign NX上使用TensorRT对Depth Anything进行加速,但是我发现转换后的trt文件的推理速度和onnx文件相比并没有显著提升,甚至还有下降。其中:
```
ONNX Inference Time: 2.7s per image
```
```
TRT Inference Time: 3.0s per image
…
-
## Description
When I use TensorRT for int8 quantization, I always encounter the accuracy fallback to fp32. The trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS parameter does not solve the issue. W…
-
## Description
A clear and concise description of the issue.
## Environment
**TensorRT Version**: 8.5
**NVIDIA GPU**: Jetson Orin Nano
**CUDA Version**: 11.4
**CUDNN Version*…
-
Model under test: Llama-2-7b-chat-hf
Following the instructions [here](https://github.com/NVIDIA/TensorRT-LLM/tree/release/0.5.0/examples/llama#awq), was able to quantize the model and build engine…
-
Hi, I'm trying to convert to tensorRT int8 Model using onnx made by keras2Onnx.
My environment is as below:
python=3.7, keras2onnx=1.7, tensorflow=2.2.0, onnx=1.7, onnxconverter_common=1.7
My s…
-
Firstly, thanks for this project that is of high quality.
I converte my model with torch2trt in code:
...
model_trt_float32 = torch2trt( my_model,[ims],max_batch_size=32);
model_trt…
-
## Description
what is the right way to calibrate a hybrid quantization model ?
i built my tensorrt engine from ONNX model by the sub code, i selected the ``` class Calibrator(trt.IInt8EntropyCa…
-
I tried to convert RT-DETR-R18 from onnx to tensorrt, and I succeeded in int8, failed in fp16.
torch2onnx in STATIC: python tools/export_onnx.py
onnx2trt: ./trtexec --onnx=rtdetr.onnx --saveEngin…