NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.79k stars 1.01k forks source link

【bloom】convert_checkpoint.py local variable 'int8_weights' referenced before assignment #741

Closed scarydemon2 closed 1 week ago

scarydemon2 commented 11 months ago

I follow the readme :

Build model with both INT8 weight-only and INT8 KV cache enabled

python convert_checkpoint.py --model_dir ./bloom/560m/ \ --dtype float16 \ --int8_kv_cache \ --use_weight_only --output_dir ./bloom/560m/trt_ckpt/int8/1-gpu/ trtllm-build --checkpoint_dir ./bloom/560m/trt_ckpt/int8/1-gpu/ \ --use_gemm_plugin float16 \ --use_gpt_attention_plugin float16 \ --output_dir ./bloom/560m/trt_engines/int8/1-gpu/

and my script is

python convert_checkpoint.py --model_dir ./Bloomz_QA+alpaca_gpt4_zh+lima_V3 \ --dtype float16 \ --int8_kv_cache \ --use_weight_only --output_dir ./Bloomz_QA+alpaca_gpt4_zh+lima_V3/trt_ckpt/int8/1-gpu/ trtllm-build --checkpoint_dir ./Bloomz_QA+alpaca_gpt4_zh+lima_V3//trt_ckpt/int8/1-gpu/ \ --use_gemm_plugin float16 \ --use_gpt_attention_plugin float16 \ --output_dir ./Bloomz_QA+alpaca_gpt4_zh+lima_V3/trt_engines/int8/1-gpu/

and I got

Traceback (most recent call last): File "/workspace/TensorRT-LLM/examples/bloom/convert_checkpoint.py", line 899, in weights = convert_hf_bloom( File "/workspace/TensorRT-LLM/examples/bloom/convert_checkpoint.py", line 668, in convert_hf_bloom np.array([1.0 / int8_weights['scale_y_quant_orig']], UnboundLocalError: local variable 'int8_weights' referenced before assignment

The code in convert_checkpoint.py shows that if use_smooth_quant ==False, the int8_weights will not been calculate.

nv-guomingz commented 11 months ago

Hi @scarydemon2 thanks for reporting this issue. The fixing for this issue had been upstreamd to main branch. Please have a try.