fix: remove lm_head post processing

foundation-model-stack / fms-hf-tuning

🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.

Apache License 2.0

28 stars 48 forks source link

After testing, found that accelerate version is not working as expected.

New logic intorduced in get_state_dict, also removes the top-level FSDP wrapper from the model. So then since FSDP keeps flattened params, all the parameters managed by the top-level wrapper will now remained flattened when model.state_dict is called. The other child FSDP wrappers will protect their parameters, since when the state_dict call recurses to them, they will use the FSDP version of state_dict to unwrap the wrappers.

This results in error:

size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([62915840]) from checkpoint, the shape in current model is torch.Size([49152, 2560]).
size mismatch for model.norm.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([2560]).

foundation-model-stack / fms-hf-tuning

fix: remove lm_head post processing #333

Description of the change

Related issue number

How to verify the PR

Was the PR tested