Open adol001 opened 3 months ago
Hello, the ndcg@10 of bge-m3-Dense on the MLDR zh dev split is 0.260, as reported in our paper. The values reported in our paper were all multiplied by 100.
@hanhainebula Based on the results from mldr, using an 8k window for Chinese embedding with bge-m3 does not seem to have practical value. Would it be better if I reduced the training material to 1k? How should I trim the mldr dataset to test only 1k window texts?
If you have tested the results of the mldr with a 1k window for Chinese, I sincerely hope you can show them here.
We didn't evaluate the results when max_passage_length=1024
. You can set max_passage_length=1024
to perform evaluation to get the corresponding results:
python FlagEmbedding/C_MTEB/MLDR/mteb_dense_eval/eval_MLDR.py \
--encoder /data/models/bge-m3 --languages zh \
--results_save_path /data/models/mldr_results \
--max_query_length 512 --max_passage_length 1024 \
--batch_size 256 --corpus_batch_size 8 \
--pooling_method cls --normalize_embeddings True \
--add_instruction False --overwrite True
Regarding the bge-m3 mldr score for lang zh , I am currently using
The obtained ndcg_at_10 score is only 0.26017, which is significantly different from what is reported in the paper. Why might this be the case?"