LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
https://open-assistant.io
Apache License 2.0
36.92k stars 3.22k forks source link

For peft trainiing how to handle tokenizer changed? #3648

Open zhanglu0704 opened 1 year ago

zhanglu0704 commented 1 year ago

If the model's num_embeddings is 10000,but we change the tokenizer to 10007. After SFT training the model's num_embeddings will be 10016, that because in model/model_training/utils/utils.py get_model(conf, tokenizer, pad_vocab_size_to_multiple_of=16, check_freeze_layer=True) has parameter pad_vocab_size_to_multiple_of=16. But when we try to start a peft training, It will fail because of the following code: if len(tokenizer) != n_embs and check_freeze_layer: assert not conf.freeze_layer, "Cannot change the number of embeddings if the model is frozen."