Qwen-14B-INT8 face the issue: 'QwenTransformerLayerWeight' object has no attribute 'q_weight_'

Int the container by image: <ghcr.io/modeltc/lightllm:main> created, use below cmd:

docker run -d --privileged --runtime nvidia --gpus all -p 9012:8000 \
-v /root/lightllm/:/app/ \
-v /root/models/:/data/ \
--name lightllm-qwen \
--entrypoint sleep ghcr.io/modeltc/lightllm:main infinity

start llm cmd:

python -m lightllm.server.api_server --model_dir /data/Qwen-14B-Chat-Int8 --tp 1 --max_total_token_num 10240 --trust_remote_code --tokenizer_mode=auto --eos_id 151643

process feedback below error:

root@0577e1aecb3d:~# python -m lightllm.server.api_server --model_dir /data/Qwen-14B-Chat-Int8 --tp 1 --max_total_token_num 10240 --trust_remote_code --tokenizer_mode=auto --eos_id 151643

INFO 02-20 06:53:17 [tokenizer.py:78] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
INFO 02-20 06:53:22 [tokenizer.py:78] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
ERROR 02-20 06:53:41 [model_rpc.py:146] load model error: 'QwenTransformerLayerWeight' object has no attribute 'q_weight_' 'QwenTransformerLayerWeight' object has no attribute 'q_weight_' <class 'AttributeError'>
Traceback (most recent call last):
  File "/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 105, in exposed_init_model
    self.model = QWenTpPartModel(model_kvargs)
  File "/lightllm/lightllm/models/qwen/model.py", line 28, in __init__
    super().__init__(kvargs)
  File "/lightllm/lightllm/models/llama/model.py", line 35, in __init__
    super().__init__(kvargs)
  File "/lightllm/lightllm/common/basemodel/basemodel.py", line 50, in __init__
    self._init_weights()
  File "/lightllm/lightllm/models/llama/model.py", line 99, in _init_weights
    [weight.verify_load() for weight in self.trans_layers_weight]
  File "/lightllm/lightllm/models/llama/model.py", line 99, in <listcomp>
    [weight.verify_load() for weight in self.trans_layers_weight]
  File "/lightllm/lightllm/models/qwen/layer_weights/transformer_layer_weight.py", line 82, in verify_load
    self.q_weight_,
AttributeError: 'QwenTransformerLayerWeight' object has no attribute 'q_weight_'
Process Process-1:
ERROR 02-20 06:53:41 [start_utils.py:24] init func start_router_process : Traceback (most recent call last):
ERROR 02-20 06:53:41 [start_utils.py:24] 
ERROR 02-20 06:53:41 [start_utils.py:24]   File "/lightllm/lightllm/server/router/manager.py", line 379, in start_router_process
ERROR 02-20 06:53:41 [start_utils.py:24]     asyncio.run(router.wait_to_model_ready())
ERROR 02-20 06:53:41 [start_utils.py:24] 
ERROR 02-20 06:53:41 [start_utils.py:24]   File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
ERROR 02-20 06:53:41 [start_utils.py:24]     return loop.run_until_complete(main)
ERROR 02-20 06:53:41 [start_utils.py:24] 
ERROR 02-20 06:53:41 [start_utils.py:24]   File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
ERROR 02-20 06:53:41 [start_utils.py:24] 
ERROR 02-20 06:53:41 [start_utils.py:24]   File "/lightllm/lightllm/server/router/manager.py", line 83, in wait_to_model_ready
ERROR 02-20 06:53:41 [start_utils.py:24]     await asyncio.gather(*init_model_ret)
ERROR 02-20 06:53:41 [start_utils.py:24] 
ERROR 02-20 06:53:41 [start_utils.py:24]   File "/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 395, in init_model
ERROR 02-20 06:53:41 [start_utils.py:24]     ans : rpyc.AsyncResult = self._init_model(kvargs)
ERROR 02-20 06:53:41 [start_utils.py:24] 
ERROR 02-20 06:53:41 [start_utils.py:24]   File "/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 149, in exposed_init_model
ERROR 02-20 06:53:41 [start_utils.py:24]     raise e
ERROR 02-20 06:53:41 [start_utils.py:24] 
ERROR 02-20 06:53:41 [start_utils.py:24]   File "/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 105, in exposed_init_model
ERROR 02-20 06:53:41 [start_utils.py:24]     self.model = QWenTpPartModel(model_kvargs)
ERROR 02-20 06:53:41 [start_utils.py:24] 
ERROR 02-20 06:53:41 [start_utils.py:24]   File "/lightllm/lightllm/models/qwen/model.py", line 28, in __init__
ERROR 02-20 06:53:41 [start_utils.py:24]     super().__init__(kvargs)
ERROR 02-20 06:53:41 [start_utils.py:24] 
ERROR 02-20 06:53:41 [start_utils.py:24]   File "/lightllm/lightllm/models/llama/model.py", line 35, in __init__
ERROR 02-20 06:53:41 [start_utils.py:24]     super().__init__(kvargs)
ERROR 02-20 06:53:41 [start_utils.py:24] 
ERROR 02-20 06:53:41 [start_utils.py:24]   File "/lightllm/lightllm/common/basemodel/basemodel.py", line 50, in __init__
ERROR 02-20 06:53:41 [start_utils.py:24]     self._init_weights()
ERROR 02-20 06:53:41 [start_utils.py:24] 
ERROR 02-20 06:53:41 [start_utils.py:24]   File "/lightllm/lightllm/models/llama/model.py", line 99, in _init_weights
ERROR 02-20 06:53:41 [start_utils.py:24]     [weight.verify_load() for weight in self.trans_layers_weight]
ERROR 02-20 06:53:41 [start_utils.py:24] 
ERROR 02-20 06:53:41 [start_utils.py:24]   File "/lightllm/lightllm/models/llama/model.py", line 99, in <listcomp>
ERROR 02-20 06:53:41 [start_utils.py:24]     [weight.verify_load() for weight in self.trans_layers_weight]
ERROR 02-20 06:53:41 [start_utils.py:24] 
ERROR 02-20 06:53:41 [start_utils.py:24]   File "/lightllm/lightllm/models/qwen/layer_weights/transformer_layer_weight.py", line 82, in verify_load
ERROR 02-20 06:53:41 [start_utils.py:24]     self.q_weight_,
ERROR 02-20 06:53:41 [start_utils.py:24] 
ERROR 02-20 06:53:41 [start_utils.py:24] AttributeError: 'QwenTransformerLayerWeight' object has no attribute 'q_weight_'
ERROR 02-20 06:53:41 [start_utils.py:24]

gpu:

Tue Feb 20 15:04:21 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:00:05.0 Off |                  Off |
| 30%   45C    P0              N/A / 450W |      2MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

ModelTC / lightllm

Qwen-14B-INT8 face the issue: 'QwenTransformerLayerWeight' object has no attribute 'q_weight_' #333