FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs
MIT License
7.56k stars 549 forks source link

question about ”BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models“ #1048

Open lijiaoyang opened 3 months ago

lijiaoyang commented 3 months ago

Very good paper. I hope to have a detailed introduction to the "basic training form of dense retrieval" mentioned in the Distant supervision section on page 5. Does it train the query and answer of MS MARCO? Does passages participate in the training? Also, does the query also have only one LMK?

kunlun531 commented 3 months ago

Thanks for your attention. During the distant supervision stage, the model is trained using query, positive passage, and negative passage from MS MARCO. The detailed data can be found at https://huggingface.co/datasets/Shitao/bge-m3-data. Yes, the query also has only one LMK.

lijiaoyang commented 3 months ago

您好!请问论文提到的位置权重w的温度系数α,是固定值吗?