Closed LithurshanK closed 2 days ago
Hi @LithurshanK ! The current code repo does not include this part. But you can manually perform linear interpolation as follows:
model = AutoModel.from_pretrained("../models/intfloat/e5-base-v2")
tokenizer = AutoTokenizer.from_pretrained("../models/intfloat/e5-base-v2")
original_pos_len = 512
target_pos_len = 8192
hidden_size = 768
factor = target_pos_len // original_pos_len
original_pos_embeddings = model.embeddings.position_embeddings
new_pos_embeddings = nn.Embedding(target_pos_len, hidden_size)
for idx in range(original_pos_len):
new_pos_embeddings.weight.data[idx*factor, :] = original_pos_embeddings.weight.data[idx, :].clone()
new_pos_embeddings.weight.data[(original_pos_len-1)*factor:, :] = original_pos_embeddings.weight.data[-1, :].clone()
for idx in range(original_pos_len-1):
for j in range(factor-1):
new_pos_embeddings.weight.data[idx*factor+j+1, :] = (original_pos_embeddings.weight.data[idx, :] * (factor-j-1) + original_pos_embeddings.weight.data[idx+1, :] * (j+1)).clone() / factor
model.config.max_position_embeddings = target_pos_len
model.embeddings.position_embeddings = new_pos_embeddings
model.save_pretrained("../models/dwzhu/e5-base-pi-8k")
tokenizer.save_pretrained("../models/dwzhu/e5-base-pi-8k")
Hi, does the code repo include the code for interpolation without finetuning E5