Adding new tokens causes performance and memory issues

artidoro commented 1 year ago

Bug description:

Adding new tokens to the tokenizer and corresponding embeddings causes the embeddings to be finetuned (requires_grad = True after resize_token_embeddings() is called).
The embeddings are updated incurring in higher memory footprint than reported in our paper
The embeddings are not saved with the checkpoints and the reloaded checkpoint will have reduced performance

Note that this bug does not affect the results mentioned in the paper. In our research code, we were explicitly freezing the embeddings after initializing the model. A temporary fix involves the same solution of freezing the embeddings. This fix is not satisfactory for use cases where new tokens need to be added and corresponding representations tuned.

A more general fix would be adding LoRA layers to the embeddings or allowing only the new embeddings to be trained. The LoRA layer for embeddings might not work as well on the output projection layer (mapping back to the vocabulary before softmax).

apachemycat commented 1 year ago

Very good patch !

victox5 commented 1 year ago

@artidoro is there a plan to implement fine tuning on new tokens in the vocabulary?

Thanks for the great repo!

kongjiellx commented 1 year ago

Can you explain why "The LoRA layer for embeddings might not work as well on the output projection layer"? The problem I recently encountered may be related to this. Thanks a lot.

chenjiasheng commented 1 year ago

Need this feature +1

artidoro / qlora

Adding new tokens causes performance and memory issues #214