FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs
MIT License
6.81k stars 491 forks source link

question about ”BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models“ #1048

Open lijiaoyang opened 1 month ago

lijiaoyang commented 1 month ago

Very good paper. I hope to have a detailed introduction to the "basic training form of dense retrieval" mentioned in the Distant supervision section on page 5. Does it train the query and answer of MS MARCO? Does passages participate in the training? Also, does the query also have only one LMK?

kunlun531 commented 1 month ago

Thanks for your attention. During the distant supervision stage, the model is trained using query, positive passage, and negative passage from MS MARCO. The detailed data can be found at https://huggingface.co/datasets/Shitao/bge-m3-data. Yes, the query also has only one LMK.

lijiaoyang commented 1 month ago

您好!请问论文提到的位置权重w的温度系数α,是固定值吗?