-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### Ultralytics YOLO Component
Expo…
-
## Description
Hi,
I have been using the INT8 Entropy Calibrator 2 for INT8 quantization in Python and it’s been working well (TensorRT 10.0.1). The example of how I use the INT8 Entropy Calibra…
-
### System Info
- GPU: NVIDIA H100 80G
- TensorRT-LLM branch main
- TensorRT-LLM commit: 535c9cc6730f5ac999e4b1cb621402b58138f819
### Who can help?
@byshiue @Superjomn
### Information
- [x] The…
-
```dockerfile
#Base Image
FROM nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3
USER root
RUN apt update && apt install --no-install-recommends rapidjson-dev python-is-python3 git-lfs curl uuid…
-
python quantize.py --model_dir /qwen-14b-chat --dtype float16 --qformat int4_awq --export_path ./qwen_14b_4bit_gs128_awq.pt --calib_size 32
python build.py --hf_model_dir=/qwen-14b-chat/ --quant…
-
System Info
CPU architecture ( x86_64)
CPU/Host memory size (64GB)
GPU properties
GPU name ( NVIDIA RTX4090)
GPU memory size (24GB)
Libraries
TensorRT-LLM branch or tag (v0.13.0)
Versions of Tenso…
-
### System Info
- CPU archtecture: x86_64
- CPU/Host memory size: 250GB total
- GPU properties
- GPU name: 2x NVIDIA A100 80GB
- GPU memory size: 160GB total
- Libraries
- tensorrt @ fi…
-
Thanks for this excellent project!
I can generate a bfloat16 model or an int8 weight model,but wehn I tried the following commands:
python ./examples/llama/build.py --model_dir ./Mixtral-8x7B-Inst…
-
## Description
Problems with building cudla models using EngineCapability::kDLA_STANDALONE.
We want to use patterns kDLA_STANDALONE to run the model, but we encounter the following error when compili…
-
## Description
For the quantized INT8 model, the inference results are correct under Orin DLA FP16, and the results are also correct under Orin GPU INT8, but the results are completely incorrect un…