NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
https://nvidia.github.io/TensorRT-Model-Optimizer
Other
536 stars 39 forks source link

unrecognized arguments: --deployment #90

Open relaxtheo opened 2 weeks ago

relaxtheo commented 2 weeks ago

I am trying the vlm_ptq by following the readme in vlm_ptq folder, and when I call a command "scripts/huggingface_example.sh --type llava --model llava-1.5-7b-hf --quant fp8 --tp 8", following error message is reported:

hf_ptq.py: error: unrecognized arguments: --deployment=

I have tried hard coded DEPLOYMENT="tensorrt_llm" in huggingface_example.sh, and still have this error message as: hf_ptq.py: error: unrecognized arguments: --deployment=tensorrt-llm

A bug in llm_ptq or a bug in huggingface_example.sh?

I am using modelopt0.17.0 by installing it with command "pip install "nvidia-modelopt[all]" --extra-index-url https://pypi.nvidia.com"

cjluo-omniml commented 2 days ago

Hi @relaxtheo could you try the latest release?

And we've deprecated the DEPLOYMENT flag instead using the export_fmt flag.

See: https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/vlm_ptq/scripts/huggingface_example.sh#L155