foundation-model-stack / fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
https://pytorch.org/docs/stable/fsdp.html
Apache License 2.0
114 stars 18 forks source link

The model conversion to hf is broken with the latest Fused GatedLinearUnit Support in ibm-fms 0.0.6 #88

Closed thinkahead closed 1 month ago

thinkahead commented 1 month ago

The latest changes in 0.0.6 https://github.com/foundation-model-stack/foundation-model-stack/commit/eccd6028cec75f84ce1834a3a18649f5d8fc0641 break the model conversion code

I am switching back to the foundation-model-stack commit d04def43e9eb8a4e0adf7285c59dd66274e1b724 that still works

JRosenkranz commented 1 month ago

The latest changes in 0.0.6 foundation-model-stack/foundation-model-stack@eccd602 break the model conversion code

I am switching back to the foundation-model-stack commit d04def43e9eb8a4e0adf7285c59dd66274e1b724 that still works

Yes, looks like this will need to be updated.

lchu-ibm commented 1 month ago

fixed in https://github.com/foundation-model-stack/fms-fsdp/commit/1b589aea239f9ca05bf078372eaeb880c5a10509

for model trained with new fms, you can convert it as is;

for model trained with old fms, you can convert it with is_old_fms flag, e.g.

python fms_to_hf.py --compiled --is_old_fms --model_variant $MODEL_VARIANT --load_path $IN_PATH --save_path $OUT_PATH --tokenizer_name_or_path $TOKENIZER