I follow #155, add 100 new tokens to the vocabulary and corresponding embedings. I try to only fintune these embeddings and fix raw tokens with pretrained weights. There are my code:
model_engine.backward(total_loss)
if args.nums_token and args.mulgpu_numtoken and args.token_detach:
textembeds_masks = torch.zeros_like(model_engine.in_adaptor.text_embed.weight).to(device=model_engine.local_rank)
# textembeds_masks[VOCAB_SIZE_SRC+1,:] = 1
with torch.no_grad():
for p_name,param in model_engine.named_parameters():
if "llm_model.base_model.model.model.embed_tokens.weight" in p_name:
if param.grad is not None:
param.grad.copy_(param.grad.data * textembeds_masks)
if "in_adaptor.text_embed.weight" in p_name:
if param.grad is not None:
param.grad.copy_(param.grad.data * textembeds_masks)
model_engine.step()
in_adaptor.text_embed.weight is initialized by llm_model.base_model.model.model.embed_tokens.weight. After some checkpoints, the raw tokens' weights of in_adaptor.text_embed.weight in different cks are different. How should I change my code to keep the raw tokens' weights the same and only fintune the new tokens' weights.
I follow #155, add 100 new tokens to the vocabulary and corresponding embedings. I try to only fintune these embeddings and fix raw tokens with pretrained weights. There are my code:
in_adaptor.text_embed.weight
is initialized byllm_model.base_model.model.model.embed_tokens.weight
. After some checkpoints, the raw tokens' weights ofin_adaptor.text_embed.weight
in different cks are different. How should I change my code to keep the raw tokens' weights the same and only fintune the new tokens' weights.