Will CycleIterator forward to dataset on resume for pretrain?

Lightning-AI / litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

Apache License 2.0

6.85k stars 726 forks source link

When resuming finetuning, I see that the CycleIterator is forwarded to the dataset where the iteration is to continue from:

However, for pretrain, this does not exist and the training seems to resume from the begining:

Can I check in this case, it looks like when resuming, the pretraining will start from the first dataset, and not forwarded?

Lightning-AI / litgpt