4AI / BeLLM

Code for BeLLM: Backward Dependency Enhanced Large Language Model for Sentence Embeddings (NAACL2024)
https://arxiv.org/abs/2311.05296
MIT License
5 stars 0 forks source link

confused with the pooling strategy? #5

Open rxqy opened 4 months ago

rxqy commented 4 months ago

Hi, I'm confused with the pooling strategy you used here.

For training, you use the avg token https://github.com/4AI/BeLLM/blob/9da9269e51d462535964d9bf82aaa14fa3ff6d7c/README.md?plain=1#L52

While for evaluation, you are not specifing any pooling flag here, https://github.com/4AI/BeLLM/blob/9da9269e51d462535964d9bf82aaa14fa3ff6d7c/README.md?plain=1#L99-L105 so this should be default value [cls], right? https://github.com/4AI/BeLLM/blob/9da9269e51d462535964d9bf82aaa14fa3ff6d7c/eval_sts.py#L57

As for the paper, you mentioned that you used the representative word as the pivot, so this should be the last non-padding token, right? So I'm wondering which token should I use or does it make no difference in a decoder based model like llama?