Closed e-p-armstrong closed 1 month ago
Training should proceed without issue when accelerate launch --use_deepspeed -m axolotl.cli.train axolotl_bittensor_llama3_finetuning.yaml is run
accelerate launch --use_deepspeed -m axolotl.cli.train axolotl_bittensor_llama3_finetuning.yaml
The datasets tokenize but invariably training will fail to begin, as axolotl freezes up after datasets are being shuffled:
This is on a second or third run. The first one, axolotl froze after the adding position ids step.
Config:
base_model: meta-llama/Meta-Llama-3-8B # Heralax/bittensor-mistral-pretrained-base-1 #mistralai/Mistral-7B-v0.1 # Heralax/bittensor-mistral-pretrained-base-1 #mistralai/Mistral-7B-v0.1 model_type: LlamaForCausalLM tokenizer_type: AutoTokenizer is_mistral_derived_model: false load_in_8bit: false load_in_4bit: false strict: false datasets: - path: json data_files: ./essays_annotation_syspromptvaried.jsonl ds_type: json type: sharegpt conversation: chatml - path: json data_files: ./tweets_annotation_syspromptvaried.jsonl ds_type: json type: sharegpt conversation: chatml - path: json data_files: ./autometa_4_percent.json ds_type: json type: sharegpt conversation: chatml # - path: json # data_files: paul_graham_essays_completion.json # ds_type: json # type: completion dataset_prepared_path: last_run_prepared output_dir: ./paulgraham-finetune-out sequence_len: 4096 sample_packing: true pad_to_sequence_len: true shuffle_merged_datasets: true wandb_project: pg-test wandb_entity: wandb_watch: wandb_run_id: wandb_log_model: gradient_accumulation_steps: 6 micro_batch_size: 2 eval_batch_size: 1 num_epochs: 7 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 0.000029 weight_decay: 0 # Gradient clipping max norm max_grad_norm: 1.0 noisy_embedding_alpha: 0 train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false gradient_checkpointing: unsloth early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true # fsdp: # - full_shard # - auto_wrap # fsdp_config: # fsdp_offload_params: false # fsdp_state_dict_type: FULL_STATE_DICT # fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer # warmup_steps: 10 warmup_ratio: 0.5 auto_resume_from_checkpoints: false #warmup_ratio: 0.5 eval_steps: 10 saves_per_epoch: 1 eval_sample_packing: false save_total_limit: 2 debug: deepspeed: deepspeed_configs/zero2.json chat_template: chatml special_tokens: bos_token: "<s>" eos_token: "</s>" unk_token: "<unk>" tokens: - "<|im_start|>" - "<|im_end|>"
Oddly enough I do not see a last_run_prepared folder.
This is on 6x A40s rented using RunPod, using the official axolotl docker image.
rolling back to 5f58555bd0dbf15cae25fc021eb00421e53e47b2 does not seem to have helped.
No response
3.11
main/c86c32a
Nevermind, might've been an issue with hte specific instance. Got a new one and made a new HF token for paranoia -- it worked.
Please check that this issue hasn't been reported before.
Expected Behavior
Training should proceed without issue when
accelerate launch --use_deepspeed -m axolotl.cli.train axolotl_bittensor_llama3_finetuning.yaml
is runCurrent behaviour
The datasets tokenize but invariably training will fail to begin, as axolotl freezes up after datasets are being shuffled:
This is on a second or third run. The first one, axolotl froze after the adding position ids step.
Config:
Oddly enough I do not see a last_run_prepared folder.
This is on 6x A40s rented using RunPod, using the official axolotl docker image.
Steps to reproduce
rolling back to 5f58555bd0dbf15cae25fc021eb00421e53e47b2 does not seem to have helped.
Config yaml
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.11
axolotl branch-commit
main/c86c32a
Acknowledgements