axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.87k stars 866 forks source link

Computation of total_num_steps must include accumulation step #1170

Open jinwonkim93 opened 9 months ago

jinwonkim93 commented 9 months ago

Please check that this issue hasn't been reported before.

Expected Behavior

total_num_steps should be calculated with accumulation step base on doc in transformers.

https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.gradient_accumulation_steps

Current behaviour

https://github.com/OpenAccess-AI-Collective/axolotl/blob/0f77b8d7986c2b5d7773771fabcbe8bc8640cbe4/src/axolotl/utils/trainer.py#L243

total_num_steps does not include accumulation step for computation but in the documentation of transformers logging, evaluation every gradient_accumulation_steps * step.

the thing is scheduler does get affected by this max step.

Steps to reproduce

try preprocessing

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

Python Version

3.10

axolotl branch-commit

main

Acknowledgements

winglian commented 9 months ago

I think the total_num_steps accounts for the gradient accumulation steps (GAS) somewhere non-obvious (I can't track it down atm). I tried a test training with GAS=1 and it had 2401 steps, and then I increased it to GAS=4 leaving everything else the same and it had 600 steps.

jinwonkim93 commented 9 months ago

I think the total_num_steps accounts for the gradient accumulation steps (GAS) somewhere non-obvious (I can't track it down atm). I tried a test training with GAS=1 and it had 2401 steps, and then I increased it to GAS=4 leaving everything else the same and it had 600 steps.

it does internally in Trainer but custom scheduler you made does not accounts it. which make difference in updating learning rate.

ex.

GAS=1 decrease by each step by cosine.

GAS=4 decrease by every 4 step by cosine.

DreamGenX commented 9 months ago

This may explain: https://github.com/OpenAccess-AI-Collective/axolotl/issues/1100