axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.48k stars 808 forks source link

Error unfreezing intermediate layers #1747

Open ayushsml opened 1 month ago

ayushsml commented 1 month ago

Please check that this issue hasn't been reported before.

Expected Behavior

Training should run properly while updating parameters of transformer layers.

Current behaviour

Error

 File "~/axolotl/env/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 149, in __init__
[rank2]:     self.dtype = self.optimizer.param_groups[0]['params'][0].dtype
[rank2]: IndexError: list index out of range

Steps to reproduce

When I unfreeze embed_tokens or lm_head or do not freeze anything. The training runs as expected. When I freeze the transformer layers, the error occurs.

Config yaml

base_model: ~/results/run1/checkpoint-2000
model_type: AutoModelForCausalLM
tokenizer_config: ~/tokenizer/final_tokenizer_hf
tokenizer_type: LlamaTokenizer
trust_remote_code: true
# Resize the model embeddings when new tokens are added to multiples of 32
# This is reported to improve training speed on some models
resize_token_embeddings_to_32x: true

load_in_8bit: false
load_in_4bit: false
strict: false

unfrozen_parameters:
  - transformer.blocks.[0-7].
  # - ^lm_head.weight$
  # - ^model.embed_tokens.weight$

model_config:
  output_router_logits: true

datasets:
  - path: json 
    type: "completion"
    data_files: ~/data/data.jsonl
    ds_type: json
output_dir: ~/results/

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
logging_steps: 1
warmup_steps: 10
gradient_accumulation_steps: 4
micro_batch_size: 8
num_epochs: 3
max_steps: 2000
eval_steps: 100
optimizer: adamw_hf
lr_scheduler: cosine
learning_rate: 0.0001

# wandb_project:
# wandb_key : 
# wandb_entity: 
# wandb_name: 
# wandb_log_model: 

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint: 
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

save_total_limit: 1
save_steps: 100
debug:
deepspeed: ~/axolotl/deepspeed_configs/zero3_bf16_cpuoffload_all.json
weight_decay: 0.0
fsdp:
fsdp_config:

Possible solution

No response

Which Operating Systems are you using?

Python Version

3.10

axolotl branch-commit

main

Acknowledgements

winglian commented 1 month ago

transformer.blocks.[0-7]. doesn't match up with any llama models. It looks like you're using the llama tokenizer though. There isn't enough information here for me to help without knowing the model architecture.

ayushbits commented 1 month ago

Thanks for reply @winglian . I am using base model as llamav2. I tried with the below configuration as well. However, I received the same error as mentioned earlier.

unfrozen_parameters:
       model.layers.*