casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
https://casper-hansen.github.io/AutoAWQ/
MIT License
1.69k stars 203 forks source link

ValueError: scales is on the meta device, we need a `value` to put in on 0. #258

Closed yixuantt closed 9 months ago

yixuantt commented 10 months ago

Hi, I am working on some quantification in AWQ. There are no issues when quantifying llama using Autoawq. However, I encountered some problems when quantifying falcon 40b. The quantification code for llama and falcon is the same, referring to the code provided in the README. There seems to be an issue with falcon during inference.

Replacing layers...: 100%|████████████████████████████████████████████████████████████████████████████| 60/60 [00:03<00:00, 16.93it/s]
Traceback (most recent call last):
  File "test.py", line 59, in <module>
    model = AutoAWQForCausalLM.from_quantized(model_path)
  File "/root/miniconda3/lib/python3.8/site-packages/awq/models/auto.py", line 52, in from_quantized
    return AWQ_CAUSAL_LM_MODEL_MAP[model_type].from_quantized(
  File "/root/miniconda3/lib/python3.8/site-packages/awq/models/base.py", line 171, in from_quantized
    load_checkpoint_and_dispatch(
  File "/root/miniconda3/lib/python3.8/site-packages/accelerate/big_modeling.py", line 556, in load_checkpoint_and_dispatch
    return dispatch_model(
  File "/root/miniconda3/lib/python3.8/site-packages/accelerate/big_modeling.py", line 396, in dispatch_model
    attach_align_device_hook_on_blocks(
  File "/root/miniconda3/lib/python3.8/site-packages/accelerate/hooks.py", line 547, in attach_align_device_hook_on_blocks
    attach_align_device_hook_on_blocks(
  File "/root/miniconda3/lib/python3.8/site-packages/accelerate/hooks.py", line 547, in attach_align_device_hook_on_blocks
    attach_align_device_hook_on_blocks(
  File "/root/miniconda3/lib/python3.8/site-packages/accelerate/hooks.py", line 547, in attach_align_device_hook_on_blocks
    attach_align_device_hook_on_blocks(
  File "/root/miniconda3/lib/python3.8/site-packages/accelerate/hooks.py", line 517, in attach_align_device_hook_on_blocks
    add_hook_to_module(module, hook)
  File "/root/miniconda3/lib/python3.8/site-packages/accelerate/hooks.py", line 156, in add_hook_to_module
    module = hook.init_hook(module)
  File "/root/miniconda3/lib/python3.8/site-packages/accelerate/hooks.py", line 254, in init_hook
    set_module_tensor_to_device(module, name, self.execution_device)
  File "/root/miniconda3/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 281, in set_module_tensor_to_device
    raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")
ValueError: scales is on the meta device, we need a `value` to put in on 0.
{
    "zero_point": true,
    "q_group_size": 128,
    "w_bit": 4,
    "version": "GEMM"
}
casper-hansen commented 10 months ago

Hi @yixuantt. Thanks for trying AutoAWQ - for Falcon, you need group size of 64. Can you try that please?