Open shuiyigt opened 1 month ago
@yuzc19
Hi @shuiyigt. Thanks for your insightful question.
I think practically, you can use a decoder-only model in a "FiDAtt" way, but in my opinion, whether it can work well or not depends heavily on its sentence representation abilities using the last token since our motivation here is to leverage the attention scores to see the whole document's helpfulness.
The decoder-only model is generally pretrained with the objective of next-token prediction, so its sentence representation abilities may not be as good as those of an encoder. It is feasible to plug in an additional encoder to assist an existing decoder-only model. I think adapting AAR to this framework can be very interesting.
There are also other ways to identify the decoder-only model's preferences for the retrieved document, like REPLUG. We are focusing on different signals, but a detailed comparison is also welcomed.
Please let me know if you have further questions.
I noticed that you use an encoder-decoder model T5 but not a decoder-only model as the source LLM due to "easily get each input doc‘s hidden_states separately". If I use a decoder-only model, get each input docs encoder just like the "FiDAtt" way, I mean separately way, use the last token but not all tokens' hidden_states average. Will that work?