Open peregilk opened 1 month ago
Hi @peregilk, the new behavior is documented here: https://github.com/AI-Hypercomputer/maxtext/blob/main/getting_started/Data_Input_Pipeline.md#huggingface-pipeline-in-multihost.
@aireenmei Thanks a lot for the explanation. I thought the drop in weights and loss here did hurt the model, and was wondering why this did not show up in my evaluations. Now it makes total sense. Thanks.
@aireenmei Just a couple of minor issues. Attaching to this thread since they are related. I followed the instructions on the page above, and discovered two minor issues:
Thanks for reporting. Yes setting eval_steps is recommended, it's no longer for debugging only. I'll update that.
@aireenmei Referring you here, because I think this issue is touched in #571 where you write:
The behaviour now seems to have changed a bit, and it might even be more confusing. I am a bit uncertain what has changed in the code here.
What I am trying to do is switching dataset during training. Here from step 160k. This is a fairly small special task dataset, and I am studying the effect. The dataset has 256 shards, and one epoch is roughly 350 steps.
Here is what is happening with comments:
This behaviour is a bit unpredictable. Especially since some shards here can be smaller, and it is hard to know when the first host runs out of shards. Running out of shards seems to hurt the model.
What is your advice here?