Open drewskidang opened 4 months ago
same question,bge embedding btw
FlagEmbedding\baai_general_embedding\finetune\data.py 73line def padding_score(self, teacher_score): group_size = None for scores in teacher_score: if scores is not None: group_size = len(scores) break if group_size is None: return None
padding_scores = [100.0] + [0.0] * (group_size - 1)
new_teacher_score = []
for scores in teacher_score:
if scores is None:
new_teacher_score.append(padding_scores)
else:
new_teacher_score.append(scores)
return new_teacher_score
You can use bge-reranker-v2 to compute scores for pos and neg, and use bge-m3 script to fine-tune models via distillation: https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/unified_finetune#2-data-format
FlagEmbedding\baai_general_embedding\finetune\data.py 73line def padding_score(self, teacher_score): group_size = None for scores in teacher_score: if scores is not None: group_size = len(scores) break if group_size is None: return None
padding_scores = [100.0] + [0.0] * (group_size - 1) new_teacher_score = [] for scores in teacher_score: if scores is None: new_teacher_score.append(padding_scores) else: new_teacher_score.append(scores) return new_teacher_score
这里并没有给bge embedding 完善这个功能对吗?我没有找到继续的代码。 看到m3是支持的
I have a train dataset with query,pos,neg. Is there a script to include knowledge distulation for scoring pos and negs?