Please check that this issue hasn't been reported before.
[X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
Zero 2 load weights but...
I'm limited to using zero3 instead of zero2 due to graphics card issues.
Current behaviour
running with torchrun --standalone --master_port 37229 --nproc_per_node=9 axolotl/cli/train.py ../../../config.yml & accelerate
NotImplementedError: Cannot copy out of meta tensor; no data!
Traceback (most recent call last):
File "/home/omegarig30/axolotl/axolotl/src/axolotl/cli/train.py", line 38, in
fire.Fire(do_cli)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, kwargs)
File "/home/omegarig30/axolotl/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/home/omegarig30/axolotl/src/axolotl/train.py", line 80, in train
model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
File "/home/omegarig30/axolotl/src/axolotl/utils/models.py", line 624, in load_model
raise err
File "/home/omegarig30/axolotl/src/axolotl/utils/models.py", line 616, in load_model
model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 839, in _load_state_dict_into_meta_model
set_module_quantized_tensor_to_device(model, param_name, param_device, value=param)
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/bitsandbytes.py", line 128, in set_module_quantized_tensor_to_device
new_value = value.to(device)
NotImplementedError: Cannot copy out of meta tensor; no data!
Traceback (most recent call last):
File "/home/omegarig30/axolotl/axolotl/src/axolotl/cli/train.py", line 38, in
fire.Fire(do_cli)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, *kwargs)
File "/home/omegarig30/axolotl/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/home/omegarig30/axolotl/src/axolotl/train.py", line 80, in train
model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
File "/home/omegarig30/axolotl/src/axolotl/utils/models.py", line 624, in load_model
raise err
File "/home/omegarig30/axolotl/src/axolotl/utils/models.py", line 616, in load_model
model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 839, in _load_state_dict_into_meta_model
set_module_quantized_tensor_to_device(model, param_name, param_device, value=param)
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/bitsandbytes.py", line 128, in set_module_quantized_tensor_to_device
new_value = value.to(device)
NotImplementedError: Cannot copy out of meta tensor; no data!
[2024-02-02 18:19:16,538] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 22965 closing signal SIGTERM
[2024-02-02 18:19:17,053] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 1 (pid: 22966) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 812, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
Please check that this issue hasn't been reported before.
Expected Behavior
Zero 2 load weights but... I'm limited to using zero3 instead of zero2 due to graphics card issues.
Current behaviour
running with torchrun --standalone --master_port 37229 --nproc_per_node=9 axolotl/cli/train.py ../../../config.yml & accelerate NotImplementedError: Cannot copy out of meta tensor; no data! Traceback (most recent call last): File "/home/omegarig30/axolotl/axolotl/src/axolotl/cli/train.py", line 38, in
fire.Fire(do_cli)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, kwargs)
File "/home/omegarig30/axolotl/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/home/omegarig30/axolotl/src/axolotl/train.py", line 80, in train
model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
File "/home/omegarig30/axolotl/src/axolotl/utils/models.py", line 624, in load_model
raise err
File "/home/omegarig30/axolotl/src/axolotl/utils/models.py", line 616, in load_model
model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 839, in _load_state_dict_into_meta_model
set_module_quantized_tensor_to_device(model, param_name, param_device, value=param)
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/bitsandbytes.py", line 128, in set_module_quantized_tensor_to_device
new_value = value.to(device)
NotImplementedError: Cannot copy out of meta tensor; no data!
Traceback (most recent call last):
File "/home/omegarig30/axolotl/axolotl/src/axolotl/cli/train.py", line 38, in
fire.Fire(do_cli)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, *kwargs)
File "/home/omegarig30/axolotl/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/home/omegarig30/axolotl/src/axolotl/train.py", line 80, in train
model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
File "/home/omegarig30/axolotl/src/axolotl/utils/models.py", line 624, in load_model
raise err
File "/home/omegarig30/axolotl/src/axolotl/utils/models.py", line 616, in load_model
model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 839, in _load_state_dict_into_meta_model
set_module_quantized_tensor_to_device(model, param_name, param_device, value=param)
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/bitsandbytes.py", line 128, in set_module_quantized_tensor_to_device
new_value = value.to(device)
NotImplementedError: Cannot copy out of meta tensor; no data!
[2024-02-02 18:19:16,538] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 22965 closing signal SIGTERM
[2024-02-02 18:19:17,053] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 1 (pid: 22966) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f( args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 812, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
Steps to reproduce
Just run. the commando
Config yaml
base_model: cognitivecomputations/dolphin-2.7-mixtral-8x7b model_type: MixtralForCausalLM tokenizer_type: LlamaTokenizer is_mistral_derived_model: false
load_in_8bit: false load_in_4bit: true strict: false device_map: null model_config: output_router_logits: false
datasets:
dataset_prepared_path: val_set_size: 0.05 eval_sample_packing: false output_dir: /home/omegarig30/models/0dai_mixtral resume_from_checkpoint: hf_use_auth_token:
adapter: qlora lora_model_dir:
sequence_len: 16384 sample_packing: true pad_to_sequence_len: true
lora_r: 32 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: lora_target_linear: true lora_fan_in_fan_out: lora_modules_to_save:
wandb_project: 0dai_mixtral1 wandb_entity: wandb_watch: wandb_run_id: wandb_log_model:
gradient_accumulation_steps: 1 micro_batch_size: 1 num_epochs: 3 optimizer: adamw_torch lr_scheduler: cosine learning_rate: 0.0002 torch_compile: false train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false
gradient_checkpointing: true early_stopping_patience: local_rank: logging_steps: 1 xformers_attention: flash_attention: true
warmup_steps: 10 eval_steps: 0.1 save_steps: 0.1 save_total_limit: 2 eval_sample_packing: true debug: deepspeed: /home/omegarig30/axolotl/deepspeed_configs/zero3_bf16.json weight_decay: 0.001 special_tokens: eos_token: "<|im_end|>" tokens:
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
Last version
Acknowledgements