Open dipankarsrirag opened 6 days ago
Hi @dipankarsrirag, thanks for your kind words. AnglE supports bi-directional LLMs.
If you want to train AnglE embedding with bi-directional LLMs, you can refer to this documentation, in Examples/b.LLM-based
If you just want to test the prompt with biLLM, you can directly use our BiLLM toolkit: https://github.com/WhereIsAI/BiLLM. It is compatible with huggingface transformers.
Hi @SeanLee97, thanks for the quick reply. I have been working with AnglE for the past few hours now. Just need a clarification:
When I initialise a bidirectional LLM with AnglE like this:
angle = AnglE.from_pretrained( 'mistralai/Mistral-7b-Instruct-v0.2', is_llm=True, apply_billm=True, billm_model_class = "MistralForCausalLM", load_kbit=4, torch_dtype=torch.bfloat16, pooling_strategy="last", trust_remote_code=True )
Would the model returned by model = angle.backbone
, have its attentions changed to bidirectional?
I have a mask filling task with each input being a <masked_sentence, target_word>
, which according to the documentation is in Prompts.C
format. But when I use the angle.fit()
method for finetuning, I get an error saying that only Prompts.A
format is supported. This made me use the SFTTrainer
with model
. Is this correct. If not, how would I do it otherwise?
hi @dipankarsrirag, here are the answers to the questions:
1) Yes. when you set is_llm=True
and apply_billm=True
, the backbone will be bi-directional.
2) The Prompts
setting only works for the inference phase. If you use angle-trainer
and want to apply a prompt for all text columns in the training stage, you can specify the prompt via --prompt_template "Here is your custom prompt {text}"
. If you use custom code, you can assign a prompt to prompt_template
in AngleDataTokenizer, see this documentation. Other situations, for example, just apply a prompt to a specific text column, please set a prompt to it manually, i.e., do it in the preprocessing.
This is an amazing work. I have been working on something that would require me to evaluate the generated outputs of models like Mistral, using a prompt like:
"Fill the [MASK] token in the sentence. Generate a single output."
Now earlier, I would simply instruction fine-tune a Mistral Model. But I would like to explore the possibility of using these models with a bi-directional attention.
I see that the library allows me to access the
backbone
model underneath. But it is not clear to me if this model has the bi-directional attention. Can you please clarify this? If it does, I could simply use thebackbone.generate()
function for my purpose.Thanks in advance!