SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
https://arxiv.org/abs/2309.12871
MIT License
397 stars 30 forks source link

Use of causal models for generation #82

Open dipankarsrirag opened 6 days ago

dipankarsrirag commented 6 days ago

This is an amazing work. I have been working on something that would require me to evaluate the generated outputs of models like Mistral, using a prompt like: "Fill the [MASK] token in the sentence. Generate a single output."

Now earlier, I would simply instruction fine-tune a Mistral Model. But I would like to explore the possibility of using these models with a bi-directional attention.

I see that the library allows me to access the backbone model underneath. But it is not clear to me if this model has the bi-directional attention. Can you please clarify this? If it does, I could simply use the backbone.generate() function for my purpose.

Thanks in advance!

SeanLee97 commented 6 days ago

Hi @dipankarsrirag, thanks for your kind words. AnglE supports bi-directional LLMs.

If you want to train AnglE embedding with bi-directional LLMs, you can refer to this documentation, in Examples/b.LLM-based

If you just want to test the prompt with biLLM, you can directly use our BiLLM toolkit: https://github.com/WhereIsAI/BiLLM. It is compatible with huggingface transformers.

dipankarsrirag commented 6 days ago

Hi @SeanLee97, thanks for the quick reply. I have been working with AnglE for the past few hours now. Just need a clarification:

  1. When I initialise a bidirectional LLM with AnglE like this: angle = AnglE.from_pretrained( 'mistralai/Mistral-7b-Instruct-v0.2', is_llm=True, apply_billm=True, billm_model_class = "MistralForCausalLM", load_kbit=4, torch_dtype=torch.bfloat16, pooling_strategy="last", trust_remote_code=True ) Would the model returned by model = angle.backbone, have its attentions changed to bidirectional?

  2. I have a mask filling task with each input being a <masked_sentence, target_word>, which according to the documentation is in Prompts.C format. But when I use the angle.fit() method for finetuning, I get an error saying that only Prompts.A format is supported. This made me use the SFTTrainer with model. Is this correct. If not, how would I do it otherwise?

SeanLee97 commented 5 days ago

hi @dipankarsrirag, here are the answers to the questions:

1) Yes. when you set is_llm=True and apply_billm=True, the backbone will be bi-directional.

2) The Prompts setting only works for the inference phase. If you use angle-trainer and want to apply a prompt for all text columns in the training stage, you can specify the prompt via --prompt_template "Here is your custom prompt {text}". If you use custom code, you can assign a prompt to prompt_template in AngleDataTokenizer, see this documentation. Other situations, for example, just apply a prompt to a specific text column, please set a prompt to it manually, i.e., do it in the preprocessing.