Closed Carolinabanana closed 7 months ago
wow thanks for pointing out!
We may take a look at it. @jzhang38
mark
I met same problem... mark and wait for an update.
https://github.com/Luodian/Otter/blob/main/pipeline/train/instruction_following.py
Sorry for not updating the commits to public repo. Here's a quick fix that we confimed it wont skip data.
When running instruction_following.py, changing gradient accumulation reduces the step count but does not increase the step time.
i.e. gradient accumulation 100 finishes 100x faster than gradient accumulation 1 for the same dataset (I have tested this)
This means data is being skipped when the default gradient accumulation is used, as accumulation should not increase training speed.