artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs
https://arxiv.org/abs/2305.14314
MIT License
9.96k stars 820 forks source link

Adding new tokens causes performance and memory issues #214

Open artidoro opened 1 year ago

artidoro commented 1 year ago

Bug description:

Note that this bug does not affect the results mentioned in the paper. In our research code, we were explicitly freezing the embeddings after initializing the model. A temporary fix involves the same solution of freezing the embeddings. This fix is not satisfactory for use cases where new tokens need to be added and corresponding representations tuned.

A more general fix would be adding LoRA layers to the embeddings or allowing only the new embeddings to be trained. The LoRA layer for embeddings might not work as well on the output projection layer (mapping back to the vocabulary before softmax).

apachemycat commented 1 year ago

Very good patch !

victox5 commented 1 year ago

@artidoro is there a plan to implement fine tuning on new tokens in the vocabulary?

Thanks for the great repo!

kongjiellx commented 1 year ago

Can you explain why "The LoRA layer for embeddings might not work as well on the output projection layer"? The problem I recently encountered may be related to this. Thanks a lot.

chenjiasheng commented 1 year ago

Need this feature +1