Closed redbrain closed 7 months ago
I think while looking through the modeling code, there is a flash attention arg in the Attention layer that defaults to true. However, the model config does not support such config. Will need to check again at a later time.
@redbrain Is the 16 GB VRAM of your T4 suffifient size to do full finetune of Phi 1.5? How much GPU memory was needed?
I think while looking through the modeling code, there is a flash attention arg in the Attention layer that defaults to true. However, the model config does not support such config. Will need to check again at a later time.
Can you tell me the filename and line number that I need to change in order to execute it without using flash-attn?
Is it possible to run the training code with flash-attn 1.x? Because I read that, flash-attn 1.x supports T4 GPUs. Will the be any dependencies conflicts or not?
@harshdhamecha , we've deprecated flash attn 1 quite a while back. You can just omit setting FA in the yaml to disable it.
I believe this should've been solved as the HF repo updated their code.
@redbrain did u manage to solve this issue eventually?
resolved
Please check that this issue hasn't been reported before.
Expected Behavior
I should be able to finetune the model.
Current behaviour
Result of the final cell of the notebook:
Steps to reproduce
load_in_8bit: false load_in_4bit: false strict: false
datasets:
dataset_prepared_path: last_run_prepared val_set_size: 0.05 output_dir: ./phi-sft-out
sequence_len: 2048 sample_packing: true pad_to_sequence_len:
adapter: lora_model_dir: lora_r: lora_alpha: lora_dropout: lora_target_linear: lora_fan_in_fan_out:
wandb_project: wandb_entity: wandb_watch: wandb_run_id: wandb_log_model:
gradient_accumulation_steps: 1 micro_batch_size: 1 num_epochs: 4 optimizer: adamw_torch adam_beta2: 0.95 adam_epsilon: 0.00001 max_grad_norm: 1.0 lr_scheduler: cosine learning_rate: 0.000003
train_on_inputs: false group_by_length: true bf16: false fp16: true tf32: false
gradient_checkpointing: early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: false
warmup_steps: 100 eval_steps: 0.05 save_steps: debug: deepspeed: weight_decay: 0.1 fsdp: fsdp_config: resize_token_embeddings_to_32x: true special_tokens: bos_token: "<|endoftext|>" eos_token: "<|endoftext|>" unk_token: "<|endoftext|>" pad_token: "<|endoftext|>"
%cd /content/axolotl !accelerate launch -m axolotl.cli.train examples/phi/phi-ft.yml --deepspeed deepspeed/zero1.json