-
Need to use TensorRT, something like https://github.com/noahmr/yolov5-tensorrt for yolov8
➝ https://github.com/triple-Mu/YOLOv8-TensorRT/blob/main/infer-det.py
will be in C++
also look at https://gi…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussi…
-
Implement quantization-aware training (QAT) and quantized inference for Jetson.
**References**
- [Pytorch QAT Blog Post](https://pytorch.org/blog/quantization-aware-training/)
- [Lil'Log Blog Post](…
-
I want to deploy triton + tensorrtllm, due to some constraints I cannot use docker container. I have figured out that I need to build the following repos:
1. https://github.com/triton-inference-server…
-
void FeatureExtraction::doInference_run(float* inputBuffer, float* outputBuffer) {
cudaMemcpyAsync(buffers[inputIndex], inputBuffer, inputStreamSize * sizeof(float), cudaMemcpyHostToDevice, c…
-
## Environment
- **GPUs**: 4x NVIDIA A100 (80GB) (nvlink. azure Standard_NC96ads_A100_v4)
- **TensorRT-LLM Version**: 0.15.0.dev2024102200
- **Environment**: Docker container
- **Memory Usage per GPU…
-
## Description
NMS Layers are much slower on TensorRT than on PyTorch (44% of the performance) and I'm looking for any possible workaround. This seems to be acknowledged as a known issue in the Tenso…
-
![sam2 drawio](https://github.com/user-attachments/assets/d394623f-efd3-4c77-901d-b0f0938c9325)
I'm currently trying to deploy a video inference model for SAM2 using TensorRT+cpp. Following his ide…
-
Hello, `0.15.0.dev2024101500` added a new issue when using the executor API with whisper
```
[TensorRT-LLM][ERROR] IExecutionContext::inferShapes: Error Code 7: Internal Error (WhisperEncoder/__add_…
-
### System Info
TensorRT-LLM v0.13.0
### Who can help?
_No response_
### Information
- [ ] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An officially supported tas…