Error full-finetuning Phi in a Colab notebook

redbrain commented 1 year ago

Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

I should be able to finetune the model.

Current behaviour

Result of the final cell of the notebook:

/content/axolotl
The following values were not passed to `accelerate launch` and had defaults used instead:
    `--num_processes` was set to a value of `1`
    `--num_machines` was set to a value of `1`
    `--mixed_precision` was set to a value of `'no'`
    `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
2023-09-29 15:26:12.415535: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
                              dP            dP   dP 
                              88            88   88 
   .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88 
   88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88 
   88.  .88  .d88b.  88.  .88 88 88.  .88   88   88 
   `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP 

[2023-09-29 15:26:16,521] [WARNING] [axolotl.validate_config:146] [PID:1435] [RANK:0] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-09-29 15:26:16,521] [WARNING] [axolotl.validate_config:202] [PID:1435] [RANK:0] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
Downloading (…)lve/main/config.json: 100% 707/707 [00:00<00:00, 2.84MB/s]
Downloading (…)former_sequential.py: 100% 1.86k/1.86k [00:00<00:00, 10.2MB/s]
[2023-09-29 15:26:17,935] [INFO] [axolotl.normalize_config:120] [PID:1435] [RANK:0] GPU memory usage baseline: 0.000GB (+0.255GB misc)
[2023-09-29 15:26:17,936] [WARNING] [axolotl.scripts.check_user_token:261] [PID:1435] [RANK:0] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from https://huggingface.co/settings/tokens if you want to use gated models or datasets.
Downloading (…)okenizer_config.json: 100% 237/237 [00:00<00:00, 1.22MB/s]
Downloading (…)olve/main/vocab.json: 100% 798k/798k [00:00<00:00, 54.8MB/s]
Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 59.1MB/s]
Downloading (…)/main/tokenizer.json: 100% 2.11M/2.11M [00:00<00:00, 8.61MB/s]
Downloading (…)in/added_tokens.json: 100% 1.08k/1.08k [00:00<00:00, 5.57MB/s]
Downloading (…)cial_tokens_map.json: 100% 99.0/99.0 [00:00<00:00, 513kB/s]
[2023-09-29 15:26:21,413] [DEBUG] [axolotl.load_tokenizer:75] [PID:1435] [RANK:0] EOS: 50256 / <|endoftext|>
[2023-09-29 15:26:21,413] [DEBUG] [axolotl.load_tokenizer:76] [PID:1435] [RANK:0] BOS: 50256 / <|endoftext|>
[2023-09-29 15:26:21,413] [DEBUG] [axolotl.load_tokenizer:77] [PID:1435] [RANK:0] PAD: None / None
[2023-09-29 15:26:21,414] [DEBUG] [axolotl.load_tokenizer:78] [PID:1435] [RANK:0] UNK: 50256 / <|endoftext|>
[2023-09-29 15:26:21,415] [INFO] [axolotl.load_tokenized_prepared_datasets:130] [PID:1435] [RANK:0] Unable to find prepared dataset in last_run_prepared/39976ebbf7007e3555bdc2fb77793749
[2023-09-29 15:26:21,415] [INFO] [axolotl.load_tokenized_prepared_datasets:131] [PID:1435] [RANK:0] Loading raw datasets...
[2023-09-29 15:26:21,415] [INFO] [axolotl.load_tokenized_prepared_datasets:136] [PID:1435] [RANK:0] No seed provided, using default seed of 42
/usr/local/lib/python3.10/dist-packages/datasets/load.py:2089: FutureWarning: 'use_auth_token' was deprecated in favor of 'token' in version 2.14.0 and will be removed in 3.0.0.
You can remove this warning by passing 'token=None' instead.
  warnings.warn(
Downloading readme: 100% 395/395 [00:00<00:00, 2.65MB/s]
Downloading data files:   0% 0/2 [00:00<?, ?it/s]
Downloading data:   0% 0.00/20.9M [00:00<?, ?B/s]
Downloading data:  20% 4.19M/20.9M [00:00<00:01, 10.5MB/s]
Downloading data:  60% 12.6M/20.9M [00:00<00:00, 16.4MB/s]
Downloading data: 100% 20.9M/20.9M [00:01<00:00, 18.8MB/s]
Downloading data files:  50% 1/2 [00:01<00:01,  1.11s/it]
Downloading data:   0% 0.00/1.11M [00:00<?, ?B/s]
Downloading data: 100% 1.11M/1.11M [00:00<00:00, 4.02MB/s]
Downloading data files: 100% 2/2 [00:01<00:00,  1.44it/s]
Extracting data files: 100% 2/2 [00:00<00:00, 2059.56it/s]
Generating train split: 9846 examples [00:00, 72881.72 examples/s]
Generating test split: 518 examples [00:00, 77942.58 examples/s]
Map (num_proc=2): 100% 9846/9846 [00:13<00:00, 732.88 examples/s]
[2023-09-29 15:26:46,756] [INFO] [axolotl.load_tokenized_prepared_datasets:354] [PID:1435] [RANK:0] merging datasets
[2023-09-29 15:26:46,760] [INFO] [axolotl.load_tokenized_prepared_datasets:361] [PID:1435] [RANK:0] Saving merged prepared dataset to disk... last_run_prepared/39976ebbf7007e3555bdc2fb77793749
Saving the dataset (1/1 shards): 100% 10185/10185 [00:00<00:00, 91227.84 examples/s]
Filter (num_proc=2): 100% 9675/9675 [00:07<00:00, 1360.13 examples/s]
Filter (num_proc=2): 100% 510/510 [00:00<00:00, 1194.18 examples/s]
Map (num_proc=2): 100% 9675/9675 [00:03<00:00, 2826.47 examples/s]
Map (num_proc=2): 100% 9675/9675 [00:04<00:00, 2102.29 examples/s]
Map (num_proc=2): 100% 510/510 [00:00<00:00, 990.74 examples/s] 
[2023-09-29 15:27:03,328] [INFO] [axolotl.calculate_total_num_steps:438] [PID:1435] [RANK:0] calculating total_num_tokens
[2023-09-29 15:27:03,359] [INFO] [axolotl.calculate_total_num_steps:445] [PID:1435] [RANK:0] total_num_tokens: 5198560
[2023-09-29 15:27:03,532] [INFO] [axolotl.calculate_total_num_steps:455] [PID:1435] [RANK:0] `total_supervised_tokens: 5198560`
[2023-09-29 15:27:03,587] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:1435] [RANK:0] generating packed batches
[2023-09-29 15:27:03,601] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:1435] [RANK:0] b1894177f41e8f3d63409640e827d5acf5abdaf3ca878d1853b47bfcf660b9dd
[2023-09-29 15:27:10,405] [INFO] [axolotl.utils.dataloader.len_w_stats:295] [PID:1435] [RANK:0] packing_efficiency_estimate: 1.0 actual packing efficiency: 0.8050616476371709
[2023-09-29 15:27:10,405] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:1435] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 5198560
[2023-09-29 15:27:10,405] [INFO] [axolotl.calculate_total_num_steps:504] [PID:1435] [RANK:0] data_loader_len: 2511
[2023-09-29 15:27:10,405] [INFO] [axolotl.calc_sample_packing_eff_est:510] [PID:1435] [RANK:0] sample_packing_eff_est across ranks: [0.8050616476371709]
[2023-09-29 15:27:10,405] [INFO] [axolotl.calculate_total_num_steps:521] [PID:1435] [RANK:0] sample_packing_eff_est: 0.81
[2023-09-29 15:27:10,405] [INFO] [axolotl.calculate_total_num_steps:526] [PID:1435] [RANK:0] total_num_steps: 10044
[2023-09-29 15:27:10,406] [INFO] [axolotl.train.train:48] [PID:1435] [RANK:0] loading tokenizer... microsoft/phi-1_5
[2023-09-29 15:27:10,723] [DEBUG] [axolotl.load_tokenizer:75] [PID:1435] [RANK:0] EOS: 50256 / <|endoftext|>
[2023-09-29 15:27:10,723] [DEBUG] [axolotl.load_tokenizer:76] [PID:1435] [RANK:0] BOS: 50256 / <|endoftext|>
[2023-09-29 15:27:10,723] [DEBUG] [axolotl.load_tokenizer:77] [PID:1435] [RANK:0] PAD: None / None
[2023-09-29 15:27:10,723] [DEBUG] [axolotl.load_tokenizer:78] [PID:1435] [RANK:0] UNK: 50256 / <|endoftext|>
[2023-09-29 15:27:10,724] [INFO] [axolotl.train.train:56] [PID:1435] [RANK:0] loading model and (optionally) peft_config...
Downloading pytorch_model.bin: 100% 2.84G/2.84G [00:12<00:00, 229MB/s]
Downloading (…)neration_config.json: 100% 69.0/69.0 [00:00<00:00, 301kB/s]
[2023-09-29 15:28:06,962] [INFO] [axolotl.train.train:108] [PID:1435] [RANK:0] Starting trainer...
[2023-09-29 15:28:06,962] [INFO] [axolotl.train.train:110] [PID:1435] [RANK:0] hang tight... sorting dataset for group_by_length
[2023-09-29 15:28:07,378] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:1435] [RANK:0] packing_efficiency_estimate: 0.81 total_num_tokens per device: 5198560
[2023-09-29 15:28:07,378] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:1435] [RANK:0] packing_efficiency_estimate: 0.81 total_num_tokens per device: 5198560
[2023-09-29 15:28:07,412] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
  0% 0/12404 [00:00<?, ?it/s][2023-09-29 15:28:22,992] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:1435] [RANK:0] packing_efficiency_estimate: 0.81 total_num_tokens per device: 5198560
[2023-09-29 15:28:22,992] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:1435] [RANK:0] generating packed batches
[2023-09-29 15:28:23,029] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:1435] [RANK:0] 130ef54177b2d9d54eff77f8da7110cded9728e272f04c9fe113b59be8643021
[2023-09-29 15:28:23,038] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:1435] [RANK:0] packing_efficiency_estimate: 0.81 total_num_tokens per device: 5198560
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/content/axolotl/src/axolotl/cli/train.py", line 38, in <module>
    fire.Fire(do_cli)
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/content/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
    train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
  File "/content/axolotl/src/axolotl/train.py", line 118, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1591, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1892, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2776, in training_step
    loss = self.compute_loss(model, inputs)
  File "/content/axolotl/src/axolotl/utils/trainer.py", line 310, in compute_loss
    return super().compute_loss(model, inputs, return_outputs=return_outputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2801, in compute_loss
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 636, in forward
    return model_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 624, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/content/axolotl/src/axolotl/models/phi/modeling_mixformer_sequential.py", line 910, in forward
    lm_logits = self.layers(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/axolotl/src/axolotl/models/phi/modeling_mixformer_sequential.py", line 838, in forward
    input = module(input, cu_seqlens=cu_seqlens, max_seqlen=max_seqlen)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/axolotl/src/axolotl/models/phi/modeling_mixformer_sequential.py", line 727, in forward
    attn_outputs = self.mixer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/axolotl/src/axolotl/models/phi/modeling_mixformer_sequential.py", line 674, in forward
    context = self.inner_attn(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/axolotl/src/axolotl/models/phi/modeling_mixformer_sequential.py", line 438, in forward
    return flash_attn_varlen_qkvpacked_func(
  File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 752, in flash_attn_varlen_qkvpacked_func
    return FlashAttnVarlenQKVPackedFunc.apply(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 255, in forward
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward(
  File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 79, in _flash_attn_varlen_forward
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
RuntimeError: FlashAttention only supports Ampere GPUs or newer.
  0% 0/12404 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 986, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'axolotl.cli.train', 'examples/phi/phi-ft.yml', '--deepspeed', 'deepspeed/zero1.json']' returned non-zero exit status 1.

Steps to reproduce

Create a Colab notebook, running on a free T4 GPU.

Run this cell:

!git clone https://github.com/OpenAccess-AI-Collective/axolotl.git
%cd axolotl
!pip install packaging
!pip install -e '.[flash-attn,deepspeed]'

Restart the runtime when prompted.

Edit the YAML file, either using the UI or this cell:


%%writefile /content/axolotl/examples/phi/phi-ft.yml
base_model: microsoft/phi-1_5
base_model_config: microsoft/phi-1_5
model_type: MixFormerSequentialForCausalLM
tokenizer_type: AutoTokenizer
is_llama_derived_model: false
trust_remote_code: true

load_in_8bit: false load_in_4bit: false strict: false

datasets:

path: timdettmers/openassistant-guanaco type: completion

dataset_prepared_path: last_run_prepared val_set_size: 0.05 output_dir: ./phi-sft-out

sequence_len: 2048 sample_packing: true pad_to_sequence_len:

adapter: lora_model_dir: lora_r: lora_alpha: lora_dropout: lora_target_linear: lora_fan_in_fan_out:

wandb_project: wandb_entity: wandb_watch: wandb_run_id: wandb_log_model:

gradient_accumulation_steps: 1 micro_batch_size: 1 num_epochs: 4 optimizer: adamw_torch adam_beta2: 0.95 adam_epsilon: 0.00001 max_grad_norm: 1.0 lr_scheduler: cosine learning_rate: 0.000003

train_on_inputs: false group_by_length: true bf16: false fp16: true tf32: false

gradient_checkpointing: early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: false

5. Run this cell:

%cd /content/axolotl !accelerate launch -m axolotl.cli.train examples/phi/phi-ft.yml --deepspeed deepspeed/zero1.json



### Possible solution

Seems like it's trying to use flash attention, even though it's explicitly set to false in the YAML. (The error also occurs if it's left blank.) Is this Axolotl's fault?

### Which Operating Systems are you using?

- [X] Linux
- [ ] macOS
- [ ] Windows

### Python Version

3.10.12

### axolotl branch-commit

main/5b0bc48 (latest at time of filing)

### Acknowledgements

- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of axolotl.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

NanoCode012 commented 1 year ago

I think while looking through the modeling code, there is a flash attention arg in the Attention layer that defaults to true. However, the model config does not support such config. Will need to check again at a later time.

gameveloster commented 10 months ago

@redbrain Is the 16 GB VRAM of your T4 suffifient size to do full finetune of Phi 1.5? How much GPU memory was needed?

harshdhamecha commented 9 months ago

I think while looking through the modeling code, there is a flash attention arg in the Attention layer that defaults to true. However, the model config does not support such config. Will need to check again at a later time.

Can you tell me the filename and line number that I need to change in order to execute it without using flash-attn?

Is it possible to run the training code with flash-attn 1.x? Because I read that, flash-attn 1.x supports T4 GPUs. Will the be any dependencies conflicts or not?

NanoCode012 commented 9 months ago

@harshdhamecha , we've deprecated flash attn 1 quite a while back. You can just omit setting FA in the yaml to disable it.

NanoCode012 commented 7 months ago

I believe this should've been solved as the HF repo updated their code.

jacobahtan commented 4 months ago

@redbrain did u manage to solve this issue eventually?

jacobahtan commented 3 months ago

resolved

axolotl-ai-cloud / axolotl