allenai / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
4.48k stars 449 forks source link

Fix Z-loss calculation #634

Closed epwalsh closed 3 months ago

epwalsh commented 3 months ago

@2015aroras uncovered a recent bug with how we're calculating Z-loss. It should be average over tokens, not instances. 714647320afa0e226106c25be75c576769cd232c introduced this bug.

See https://github.com/mlfoundations/open_lm/blob/main/open_lm/losses.py for a reference implementation.

dirkgr commented 3 months ago

Can we add a test for this being correct?

epwalsh commented 3 months ago

Can we add a test for this being correct?

@dirkgr we've been YOLO-ing the trainer so far with 0 test coverage. This is bad, but setting up the boilerplate for proper trainer tests is a project in and of itself. My vision is that eventually we move and improve the trainer to OLMo-core, at which point we could add thorough tests.

dirkgr commented 3 months ago

Hm, fair enough, maybe a bigger discussion. I'd rather merge OLMo-core and OLMo, and separately, have more things be functional rather than object oriented.

Because of the objects everywhere, I had to copy and paste so many things in https://github.com/allenai/OLMo/blob/FindHighGnormInstances/scripts/find_high_gnorm_instances.py. But a lot of that stuff doesn't need to be in an object.

And if it's functional, then it can be tested in isolation.

But that's not in scope for this fix.

AkshitaB commented 3 months ago

Can we add a test for this being correct?

@dirkgr we've been YOLO-ing the trainer so far with 0 test coverage. This is bad, but setting up the boilerplate for proper trainer tests is a project in and of itself. My vision is that eventually we move and improve the trainer to OLMo-core, at which point we could add thorough tests.

Based on my experience with the code bugs and the lack of tests lately, I would rather we add tests now as and when we encounter things, instead of waiting to find time to write perfect boilerplate for tests. I think the goal should be to have a place where we at least record the things that require testing, that have failed accidentally without tests. We can always go back and refactor with better testing code.