-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### Ultralytics YOLO Component
Expo…
-
## Description
Hi,
I have been using the INT8 Entropy Calibrator 2 for INT8 quantization in Python and it’s been working well (TensorRT 10.0.1). The example of how I use the INT8 Entropy Calibra…
-
### System Info
- GPU: NVIDIA H100 80G
- TensorRT-LLM branch main
- TensorRT-LLM commit: 535c9cc6730f5ac999e4b1cb621402b58138f819
### Who can help?
@byshiue @Superjomn
### Information
- [x] The…
-
### Search before asking
- [X] I have searched the HUB [issues](https://github.com/ultralytics/hub/issues) and [discussions](https://github.com/ultralytics/hub/discussions) and found no similar quest…
-
```dockerfile
#Base Image
FROM nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3
USER root
RUN apt update && apt install --no-install-recommends rapidjson-dev python-is-python3 git-lfs curl uuid…
-
python quantize.py --model_dir /qwen-14b-chat --dtype float16 --qformat int4_awq --export_path ./qwen_14b_4bit_gs128_awq.pt --calib_size 32
python build.py --hf_model_dir=/qwen-14b-chat/ --quant…
-
System Info
CPU architecture ( x86_64)
CPU/Host memory size (64GB)
GPU properties
GPU name ( NVIDIA RTX4090)
GPU memory size (24GB)
Libraries
TensorRT-LLM branch or tag (v0.13.0)
Versions of Tenso…
-
### System Info
- CPU archtecture: x86_64
- CPU/Host memory size: 250GB total
- GPU properties
- GPU name: 2x NVIDIA A100 80GB
- GPU memory size: 160GB total
- Libraries
- tensorrt @ fi…
-
Thanks for this excellent project!
I can generate a bfloat16 model or an int8 weight model,but wehn I tried the following commands:
python ./examples/llama/build.py --model_dir ./Mixtral-8x7B-Inst…
-
## Description
Problems with building cudla models using EngineCapability::kDLA_STANDALONE.
We want to use patterns kDLA_STANDALONE to run the model, but we encounter the following error when compili…