Open jsh2581 opened 3 days ago
one log in one step
duplicated log in one step
accelerate launch -m axolotl.cli.train my_config.yml
base_model: meta-llama/Llama-3.2-3B plugins: - axolotl.integrations.liger.LigerPlugin liger_rope: true liger_rms_norm: true liger_swiglu: true liger_fused_linear_cross_entropy: false strict: false chat_template: output_dir: /workspace/axolotl/3_model/pretraining skip_prepare_dataset: true datasets: - path: /workspace/axolotl/2_data/dataset-tokenized-8k/train split: train type: sequence_len: 8192 sample_packing: false pad_to_sequence_len: false # mlflow configuration if you're using it mlflow_tracking_uri: http://mlflow-server:5000 mlflow_experiment_name: llama-3B mlflow_run_name: llama-3B gradient_accumulation_steps: 1 micro_batch_size: 2 # num_epochs: 1 # max_steps: 200000 optimizer: adamw_torch lr_scheduler: cosine lr_scheduler_kwargs: cosine_min_lr_ratio: 1e-3 learning_rate: 1e-5 train_on_inputs: false group_by_length: false bf16: true fp16: tf32: false gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false early_stopping_patience: resume_from_checkpoint: logging_steps: 1 xformers_attention: #flash_attention: true warmup_steps: 20000 #evals_per_epoch: 2 eval_table_size: save_steps: 40000 debug: deepspeed: weight_decay: 0.0 fsdp: # - full_shard # - auto_wrap # fsdp_config: # fsdp_limit_all_gathers: true # fsdp_sync_module_states: true # fsdp_offload_params: false # fsdp_use_orig_params: false # fsdp_cpu_ram_efficient_loading: true # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP # fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer # fsdp_state_dict_type: FULL_STATE_DICT # fsdp_sharding_strategy: FULL_SHARD # fsdp_backward_prefetch: BACKWARD_PRE special_tokens: pad_token: <|end_of_text|>
No response
3.11
main/8c3a727f9d60ffd3af385f90bcc3fa3a56398fe1
cc @awhazell , do you perhaps see any duplicate logging recently to mlflow?
Please check that this issue hasn't been reported before.
Expected Behavior
one log in one step
Current behaviour
duplicated log in one step
Steps to reproduce
accelerate launch -m axolotl.cli.train my_config.yml
Config yaml
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.11
axolotl branch-commit
main/8c3a727f9d60ffd3af385f90bcc3fa3a56398fe1
Acknowledgements