Closed trestad closed 6 months ago
Dear Ang Lv,
Thanks for your interest in our work and for bringing your paper to our attention. We are sorry to have missed it and we will update the next revision of our paper accordingly.
However, although both papers experiment with some version of bidirectional attention and perform fine-tuning using masked token prediction, there are several differences between your work and ours.
Our work is a simple light-weight approach for text representation, while the mentioned work’s focus is on text generation. LLM2Vec first enables bidirectional attention by simply disabling the causal masks. We do not change the positional encodings of the base model, unlike the mentioned work. We consider MNTP as a general transformation that teaches the model how to use bidirectional attention, while the mentioned work fine-tunes the bidirectional attention for a very specific task. Also, our work is focused on word-level and sentence-level embedding tasks, and we analyze the impact of bidirectional attention and MNTP on popular downstream tasks. Additionally, we conduct a comprehensive analysis of the impact of bidirectional attention on different popular LLMs.
We hope our figure 1 highlights these key points of our approach. Again, we thank you for your interest.
Thank you for your response. I would like to clarify that I acknowledge your contribution to text embedding. Without doubt, this work benefits the entire community. I appreciate it that you will add a discussion on MNTP. As a friendly reminder, I'd like to point out that in https://arxiv.org/pdf/2311.05296.pdf, they also “enable bidirectional attention by simply disabling the causal masks” and obtained an embedding model. I hope this additional information helps in your paper.
Thank you for the pointer. We will discuss differences and similarities with BeLLM in our final version of the paper.
Besides, another paper, published in Oct 2023 (even before the previously mentioned paper), proposed to remove causal masks and enable bi-directional attention. Arxiv link can be found in Label Supervised LLaMA. Based on this paper, another paper in Jan 2024 further discussed the setting of different masks when using LLM as encoder. link to the paper.
Congratulations on your impressive work! However, I have some concerns regarding the claim made in the paper about MNTP being "a novel training objective that combines next token prediction with masked language modeling."
It appears that the same training method was proposed in the paper https://arxiv.org/pdf/2311.07468.pdf. Furthermore, the overview figure in that paper bears significant resemblance to yours.
I hope that the claim of novelty and a discussion about the related paper be included in your work. Looking forward to your response.