TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
I am trying the vlm_ptq by following the readme in vlm_ptq folder, and when I call a command "scripts/huggingface_example.sh --type llava --model llava-1.5-7b-hf --quant fp8 --tp 8", following error message is reported:
I have tried hard coded DEPLOYMENT="tensorrt_llm" in huggingface_example.sh, and still have this error message as:
hf_ptq.py: error: unrecognized arguments: --deployment=tensorrt-llm
A bug in llm_ptq or a bug in huggingface_example.sh?
I am using modelopt0.17.0 by installing it with command "pip install "nvidia-modelopt[all]" --extra-index-url https://pypi.nvidia.com"
I am trying the vlm_ptq by following the readme in vlm_ptq folder, and when I call a command "scripts/huggingface_example.sh --type llava --model llava-1.5-7b-hf --quant fp8 --tp 8", following error message is reported:
hf_ptq.py: error: unrecognized arguments: --deployment=
I have tried hard coded DEPLOYMENT="tensorrt_llm" in huggingface_example.sh, and still have this error message as: hf_ptq.py: error: unrecognized arguments: --deployment=tensorrt-llm
A bug in llm_ptq or a bug in huggingface_example.sh?
I am using modelopt0.17.0 by installing it with command "pip install "nvidia-modelopt[all]" --extra-index-url https://pypi.nvidia.com"