DeepSpeed Zero3 is Incompatible with Freeze Range Code

josharian commented 4 months ago

Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Set up a config like:

unfrozen_parameters:
 - ^model.embed_tokens.weight$[128256:] # only train the new tokens

deepspeed: deepspeed_configs/zero3.json

Train.

Expect something like:

Unfrozen model.embed_tokens.weight with ranges [(128256, 130304)]

Got:

Unfrozen model.embed_tokens.weight with ranges [(128256, 0)]

This leads to things...not working as intended.

https://github.com/OpenAccess-AI-Collective/axolotl/pull/1686 will make diagnosis/recognition of this easier. But it doesn't fix the root problem.

AFAICT, the root problem is that deepspeed/zero3.json changes model loading such that the parameters no longer have their original shapes, like this:

>>> print(model.state_dict()["model.embed_tokens.weight"].shape)
torch.Size([0])

As a result, when range end is None, it gets set to 0.

(It also appears that this may mess with model saving as well. My saved models with deepspeed/zero3.json are way too small, possible because they have shape torch.Size([0]) for almost all layers.)

Current behaviour

see above

Steps to reproduce

see above

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

[X] Linux
[X] macOS
[ ] Windows

Python Version

3.11

axolotl branch-commit

whatever docker image has (how do i get this from the docker image?)

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of axolotl.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

winglian commented 3 months ago

Zero3 should handle frozen modules. per https://github.com/microsoft/DeepSpeed/pull/2653/files. Are we perhaps freezing/unfreezing too late after deepspeed has wrapped the model?

josharian commented 3 months ago

Zero3 should handle frozen modules.

I think the trouble is that range freezing relies on having shape information available, and once deepspeed has wrapped the model, that shape information is unavailable.

Are we perhaps freezing/unfreezing too late after deepspeed has wrapped the model?

That sounds plausible. (Might that also mean that deepspeed isn't as effective as it could be at memory usage?)

ccdv-ai commented 3 months ago

@winglian Got the same problem with a stage 1 config. Unfreezing an entire layer doesn't work (no gradient).

No problem with fsdp

axolotl-ai-cloud / axolotl