Closed risedangel closed 9 months ago
Also have issue with Mistral 7b instruct V0.2. no problem with V0.1 model
May I ask if you could try running with normal accelerate since you only have single GPU?
I have the exact same issue with instruct v0.2. In my case, it happened using the code from the repo (not docker) and the exact qlora example config with accelerate (no deepspeed or any other editions to the yml). So, it looks like the constant that breaks everything is the model version, as the error keeps happening even if everything else is different
Hi @Nirogu @risedangel @Nondzu . I encountered this issue previously, fixed, and closed the issue. I wasn't aware there was a duplicate. Please see https://github.com/OpenAccess-AI-Collective/axolotl/issues/1047
Hi @Nirogu @risedangel @Nondzu . I encountered this issue previously, fixed, and closed the issue. I wasn't aware there was a duplicate. Please see #1047
While this makes the training work, it effectively re-adds the sliding window, which is undesirable.
Thanks to Dream's PR. You don't need my prior override anymore. You may use as-is.
Please check that this issue hasn't been reported before.
Expected Behavior
ı would expect finetuning to start withouth a problem
Current behaviour
It fails and spits error. Here is the output
[2023-12-16 22:47:07,603] [WARNING] [axolotl.scripts.check_user_token:358] [PID:441] [RANK:0] Error verifying HuggingFace token. Remember to log in using
[2023-12-16 22:47:07,841] [INFO] [axolotl.load_tokenized_prepared_datasets:143] [PID:441] [RANK:0] Loading prepared dataset from disk at last_run_prepared/f98d5b0b00654992f42fe72d04d0e1f1...
[2023-12-16 22:47:07,843] [INFO] [axolotl.load_tokenized_prepared_datasets:145] [PID:441] [RANK:0] Prepared dataset loaded from disk...
Filter (num_proc=12): 100%|█████| 49318/49318 [00:01<00:00, 37641.02 examples/s]
Filter (num_proc=12): 100%|███████| 2596/2596 [00:00<00:00, 14676.47 examples/s]
Map (num_proc=12): 100%|████████| 49318/49318 [00:01<00:00, 31290.48 examples/s]
Map (num_proc=12): 100%|██████████| 2596/2596 [00:00<00:00, 10745.16 examples/s]
[2023-12-16 22:47:11,573] [DEBUG] [axolotl.log:60] [PID:441] [RANK:0] total_num_tokens: 583433
[2023-12-16 22:47:11,586] [DEBUG] [axolotl.log:60] [PID:441] [RANK:0]
[2023-12-16 22:47:15,026] [DEBUG] [axolotl.train.log:60] [PID:441] [RANK:0] loading model and peft_config...
[2023-12-16 22:47:15,187] [INFO] [axolotl.load_model:250] [PID:441] [RANK:0] patching with flash attention
Loading checkpoint shards: 100%|██████████████████| 3/3 [00:07<00:00, 2.55s/it]
[2023-12-16 22:47:24,147] [INFO] [axolotl.load_model:505] [PID:441] [RANK:0] GPU memory usage after model load: 4.343GB (+0.114GB cache, +0.891GB misc)
[2023-12-16 22:47:24,151] [INFO] [axolotl.load_model:528] [PID:441] [RANK:0] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-16 22:47:24,153] [INFO] [axolotl.load_model:540] [PID:441] [RANK:0] converting modules to torch.bfloat16 for flash attention
[2023-12-16 22:47:24,155] [INFO] [axolotl.load_lora:643] [PID:441] [RANK:0] found linear modules: ['q_proj', 'gate_proj', 'v_proj', 'up_proj', 'down_proj', 'o_proj', 'k_proj']
[2023-12-16 22:47:24,169] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.:16] [PID:441] CUDA extension not installed.
[2023-12-16 22:47:24,170] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.:15] [PID:441] CUDA extension not installed.
trainable params: 83,886,080 || all params: 7,325,618,176 || trainable%: 1.1451058188485088
[2023-12-16 22:47:24,635] [INFO] [axolotl.load_model:570] [PID:441] [RANK:0] GPU memory usage after adapters: 4.668GB (+0.914GB cache, +0.891GB misc)
[2023-12-16 22:47:24,658] [INFO] [axolotl.train.log:60] [PID:441] [RANK:0] Pre-saving adapter config to ./qlora-out
[2023-12-16 22:47:24,660] [INFO] [axolotl.train.log:60] [PID:441] [RANK:0] Starting trainer...
[2023-12-16 22:47:24,864] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:441] [RANK:0] packing_efficiency_estimate: 0.99 total_num_tokens per device: 10877620
[2023-12-16 22:47:24,879] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:441] [RANK:0] packing_efficiency_estimate: 0.99 total_num_tokens per device: 10877620
[2023-12-16 22:47:24,886] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
0%| | 0/165 [00:00<?, ?it/s][2023-12-16 22:47:26,481] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:441] [RANK:0] packing_efficiency_estimate: 0.99 total_num_tokens per device: 10877620
Traceback (most recent call last):
File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 38, in
fire.Fire(do_cli)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(varargs, kwargs)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/workspace/axolotl/src/axolotl/train.py", line 129, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1540, in train
return inner_training_loop(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1857, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2733, in training_step
loss = self.compute_loss(model, inputs)
File "/workspace/axolotl/src/axolotl/core/trainer_builder.py", line 291, in compute_loss
return super().compute_loss(model, inputs, return_outputs=return_outputs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2756, in compute_loss
outputs = model(inputs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 659, in forward
return model_forward(*args, *kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 647, in call
return convert_to_fp32(self.model_forward(args, kwargs))
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/peft/peft_model.py", line 977, in forward
return self.base_model(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 106, in forward
return self.model.forward(args, kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1053, in forward
outputs = self.model(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(args, kwargs)
File "/workspace/axolotl/src/axolotl/monkeypatch/mistral_attn_hijack_flash.py", line 489, in mistral_model_forward
self._prepare_decoder_attention_mask( # pylint: disable=protected-access
File "/workspace/axolotl/src/axolotl/monkeypatch/mistral_attn_hijack_flash.py", line 103, in _prepare_decoder_attention_mask
sliding_window_mask = _make_sliding_window_causal_mask(
RuntimeError: _make_sliding_window_causal_mask() Expected a value of type 'int' for argument 'sliding_window' but instead found type 'NoneType'.
Position: 5
Value: None
Declaration: _make_sliding_window_causal_mask(int bsz, int tgt_len, int dtype, Device device, int past_key_values_length=0, int sliding_window=4096) -> Tensor
Cast error details: Unable to cast Python instance to C++ type (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)
0%| | 0/165 [00:00<?, ?it/s]
sys.exit(main())
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 994, in launch_command
simple_launcher(args)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 636, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/py3.10/bin/python3', '-m', 'axolotl.cli.train', 'examples/mistral/qloraEdited.yml', '--deepspeed', 'deepspeed/zero2.json']' returned non-zero exit status
huggingface-cli login
and get your access token from https://huggingface.co/settings/tokens if you want to use gated models or datasets. [2023-12-16 22:47:07,841] [DEBUG] [axolotl.load_tokenizer:167] [PID:441] [RANK:0] EOS: 2 / [2023-12-16 22:47:07,841] [DEBUG] [axolotl.load_tokenizer:168] [PID:441] [RANK:0] BOS: 1 /[2023-12-16 22:47:07,841] [DEBUG] [axolotl.load_tokenizer:169] [PID:441] [RANK:0] PAD: 2 /[2023-12-16 22:47:07,841] [DEBUG] [axolotl.load_tokenizer:170] [PID:441] [RANK:0] UNK: 0 /total_supervised_tokens: 355619
[2023-12-16 22:47:14,426] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:441] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 583433 [2023-12-16 22:47:14,426] [DEBUG] [axolotl.log:60] [PID:441] [RANK:0] data_loader_len: 34 [2023-12-16 22:47:14,426] [INFO] [axolotl.log:60] [PID:441] [RANK:0] sample_packing_eff_est across ranks: [0.9891645643446181] [2023-12-16 22:47:14,426] [DEBUG] [axolotl.log:60] [PID:441] [RANK:0] sample_packing_eff_est: None [2023-12-16 22:47:14,426] [DEBUG] [axolotl.log:60] [PID:441] [RANK:0] total_num_steps: 34 [2023-12-16 22:47:14,454] [DEBUG] [axolotl.log:60] [PID:441] [RANK:0] total_num_tokens: 10877620 [2023-12-16 22:47:14,671] [DEBUG] [axolotl.log:60] [PID:441] [RANK:0]total_supervised_tokens: 6550004
[2023-12-16 22:47:14,776] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:441] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 10877620 [2023-12-16 22:47:14,777] [DEBUG] [axolotl.log:60] [PID:441] [RANK:0] data_loader_len: 656 [2023-12-16 22:47:14,777] [INFO] [axolotl.log:60] [PID:441] [RANK:0] sample_packing_eff_est across ranks: [0.9894444654666542] [2023-12-16 22:47:14,777] [DEBUG] [axolotl.log:60] [PID:441] [RANK:0] sample_packing_eff_est: 0.99 [2023-12-16 22:47:14,777] [DEBUG] [axolotl.log:60] [PID:441] [RANK:0] total_num_steps: 656 [2023-12-16 22:47:14,781] [DEBUG] [axolotl.train.log:60] [PID:441] [RANK:0] loading tokenizer... mistralai/Mistral-7B-Instruct-v0.2 [2023-12-16 22:47:15,026] [DEBUG] [axolotl.load_tokenizer:167] [PID:441] [RANK:0] EOS: 2 / [2023-12-16 22:47:15,026] [DEBUG] [axolotl.load_tokenizer:168] [PID:441] [RANK:0] BOS: 1 /[2023-12-16 22:47:15,026] [DEBUG] [axolotl.load_tokenizer:169] [PID:441] [RANK:0] PAD: 2 /[2023-12-16 22:47:15,026] [DEBUG] [axolotl.load_tokenizer:170] [PID:441] [RANK:0] UNK: 0 /Traceback (most recent call last): File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in
Steps to reproduce
Start the training with a slightly edited QLora script, i only changed the model and the dataset. The model was mistral anyways.
Config yaml
No response
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
docker
Acknowledgements