Gradient accumulation appears to skip data

Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

https://otter-ntu.github.io/

MIT License

3.52k stars 241 forks source link

Gradient accumulation appears to skip data #304

Closed Carolinabanana closed 7 months ago

Carolinabanana commented 8 months ago

When running instruction_following.py, changing gradient accumulation reduces the step count but does not increase the step time.

i.e. gradient accumulation 100 finishes 100x faster than gradient accumulation 1 for the same dataset (I have tested this)

This means data is being skipped when the default gradient accumulation is used, as accumulation should not increase training speed.

Luodian commented 8 months ago

wow thanks for pointing out!

We may take a look at it. @jzhang38

peiliu0408 commented 8 months ago

mark

LaoRann commented 8 months ago

I met same problem... mark and wait for an update.

Luodian commented 7 months ago

https://github.com/Luodian/Otter/blob/main/pipeline/train/instruction_following.py

Sorry for not updating the commits to public repo. Here's a quick fix that we confimed it wont skip data.