tensorrt-int8-python Search Results

ultralytics/ultralytics #17445

model export error with using TensorRT 8.6.1

### Search before asking - [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report. ### Ultralytics YOLO Component Expo…

KwCCCC updated 1 week ago

NVIDIA/TensorRT #4095

INT8EntropyCalibrator2 implicit quantization superseded by e…

## Description Hi, I have been using the INT8 Entropy Calibrator 2 for INT8 quantization in Python and it’s been working well (TensorRT 10.0.1). The example of how I use the INT8 Entropy Calibra…

adaber updated 1 week ago

NVIDIA/TensorRT-LLM #2469

build trtllm very slow and raise an error

### System Info - GPU： NVIDIA H100 80G - TensorRT-LLM branch main - TensorRT-LLM commit: 535c9cc6730f5ac999e4b1cb621402b58138f819 ### Who can help? @byshiue @Superjomn ### Information - [x] The…

anaivebird updated 4 days ago

NVIDIA/TensorRT-LLM #1649

Is model conversion hardware specific.

```dockerfile #Base Image FROM nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3 USER root RUN apt update && apt install --no-install-recommends rapidjson-dev python-is-python3 git-lfs curl uuid…

kalpesh22-21 updated 1 week ago

NVIDIA/TensorRT-LLM #776

unable to build qwen awq model with multi gpus

python quantize.py --model_dir /qwen-14b-chat --dtype float16 --qformat int4_awq --export_path ./qwen_14b_4bit_gs128_awq.pt --calib_size 32 python build.py --hf_model_dir=/qwen-14b-chat/ --quant…

tbup updated 1 week ago

NVIDIA/TensorRT-LLM #2344

When I used convert_checkpoint.py to convert Gemma hf format…

System Info CPU architecture ( x86_64) CPU/Host memory size (64GB) GPU properties GPU name ( NVIDIA RTX4090) GPU memory size (24GB) Libraries TensorRT-LLM branch or tag (v0.13.0) Versions of Tenso…

imilli updated 1 week ago

NVIDIA/TensorRT-LLM #1172

Failed to quantize Llama2 70b fine tuned model to AWQ Int4

### System Info - CPU archtecture: x86_64 - CPU/Host memory size: 250GB total - GPU properties - GPU name: 2x NVIDIA A100 80GB - GPU memory size: 160GB total - Libraries - tensorrt @ fi…

aikitoria updated 1 week ago

NVIDIA/TensorRT-LLM #730

Can I build a engine for Llama model, with dtype bfloat16 an…

Thanks for this excellent project! I can generate a bfloat16 model or an int8 weight model，but wehn I tried the following commands: python ./examples/llama/build.py --model_dir ./Mixtral-8x7B-Inst…

ztxz16 updated 1 week ago

NVIDIA/TensorRT #4165

DLA standalone: Safety certified DLA should only have one gr…

## Description Problems with building cudla models using EngineCapability::kDLA_STANDALONE. We want to use patterns kDLA_STANDALONE to run the model, but we encounter the following error when compili…

mayulin0206 updated 1 month ago

NVIDIA/TensorRT #3799

run int8 model failure of TensorRT 8.4.12 when running yolo …

## Description For the quantized INT8 model, the inference results are correct under Orin DLA FP16, and the results are also correct under Orin GPU INT8, but the results are completely incorrect un…

mayulin0206 updated 4 days ago

1000+ results for tensorrt-int8-python

1000+ results
for tensorrt-int8-python