casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
https://casper-hansen.github.io/AutoAWQ/
MIT License
1.75k stars 208 forks source link

didn't support ZeRO3? #430

Open ghost opened 7 months ago

ghost commented 7 months ago

Hi, I tried to load quantized awq model with deepspeed zero3. I met the following error:

  File "/workspace/code/utils.py", line 61, in create_and_prepare_model
    model = AutoAWQForCausalLM.from_quantized(
  File "/usr/local/lib/python3.10/dist-packages/awq/models/auto.py", line 95, in from_quantized
    return AWQ_CAUSAL_LM_MODEL_MAP[model_type].from_quantized(
  File "/usr/local/lib/python3.10/dist-packages/awq/models/base.py", line 410, in from_quantized
    model = target_cls.from_config(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 437, in from_config
    return model_class._from_config(config, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1318, in _from_config
    model = cls(config, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 503, in wrapper
    f(module, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1136, in __init__
    self.model = LlamaModel(config)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 503, in wrapper
    f(module, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 940, in __init__
    self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 513, in wrapper
    self._post_init_method(module)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1051, in _post_init_method
    param.data = param.data.to(self.local_device)
NotImplementedError: Cannot copy out of meta tensor; no data!

the code is:

model = AutoAWQForCausalLM.from_quantized(
            args.model_name_or_path,
            max_seq_len=data_args.max_seq_length,
            fuse_layers=False,
            trust_remote_code=True,
            low_cpu_mem_usage=False,         
            )

and the deepspeed config is:

compute_environment: LOCAL_MACHINE                                                                                                                                           
debug: false
deepspeed_config:
  deepspeed_multinode_launcher: standard
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: true
  zero3_save_16bit_model: true
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

If zero3 is supported with autoAWQ?

casper-hansen commented 7 months ago

DeepSpeed is not supported with AutoAWQ. We use accelerate.

ghost commented 7 months ago

could you please share me the accelerate config? What kind of parallelism are you using? only DP?