Open artidoro opened 1 year ago
Very good patch !
@artidoro is there a plan to implement fine tuning on new tokens in the vocabulary?
Thanks for the great repo!
Can you explain why "The LoRA layer for embeddings might not work as well on the output projection layer"? The problem I recently encountered may be related to this. Thanks a lot.
Need this feature +1
Bug description:
requires_grad = True
afterresize_token_embeddings()
is called).Note that this bug does not affect the results mentioned in the paper. In our research code, we were explicitly freezing the embeddings after initializing the model. A temporary fix involves the same solution of freezing the embeddings. This fix is not satisfactory for use cases where new tokens need to be added and corresponding representations tuned.
A more general fix would be adding LoRA layers to the embeddings or allowing only the new embeddings to be trained. The LoRA layer for embeddings might not work as well on the output projection layer (mapping back to the vocabulary before softmax).