Closed mdabedr closed 6 months ago
Hi @mdabedr, thanks for following our work.
To solve this problem, you should set train_mode=True
when initializing angle.
B.T.W,
1) LLaMA-based models no need to explicitly set 'cuda()'.
2) It is recommended to train the LLaMA-based model using 4 bit (load_kbit=4
).
3) For parameters, you can compare w2=1.0
and w2=35.0
in loss_kwargs. They are two practical parameters to achieve good performance.
You can initialize angle in your experiment as follows:
angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf',
pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2',
load_kbit=4,
train_mode=True)
Hi @mdabedr, thanks for following our work.
To solve this problem, you should set
train_mode=True
when initializing angle. B.T.W,
- LLaMA-based models no need to explicitly set 'cuda()'.
- It is recommended to train the LLaMA-based model using 4 bit (
load_kbit=4
).- For parameters, you can compare
w2=1.0
andw2=35.0
in loss_kwargs. They are two practical parameters to achieve good performance.You can initialize angle in your experiment as follows:
angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf', pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2', load_kbit=4, train_mode=True)
Hi @mdabedr , sorry for forgetting another important thing. You have to specify the corresponding prompt (Prompts.A
for SeanLee97/angle-llama-7b-nli-v2
) when fine-tuning our LLaMA-based models as follows:
from angle_emb import Prompts
train_ds = ds['train'].shuffle().map(AngleDataTokenizer(angle.tokenizer, angle.max_length, prompt_template=Prompts.A), num_proc=8)
valid_ds = ds['valid'].map(AngleDataTokenizer(angle.tokenizer, angle.max_length, prompt_template=Prompts.A), num_proc=8)
test_ds = ds['test'].map(AngleDataTokenizer(angle.tokenizer, angle.max_length, prompt_template=Prompts.A), num_proc=8)
Hi. Thank you so much for replying. Can you tell me how to test the trained model?
You can try it
angle = AnglE.from_pretrained(
'NousResearch/Llama-2-7b-hf',
pretrained_lora_path ='your_custom_model_path',
apply_bfloat16=True)
print(angle.evaluate(test_ds, device=angle.device))
Remember to set apply_bfloat16=True
or load_kbit=16
for LLaMA when evaluating.
Hi @mdabedr, thanks for following our work.
To solve this problem, you should set
train_mode=True
when initializing angle. B.T.W,
- LLaMA-based models no need to explicitly set 'cuda()'.
- It is recommended to train the LLaMA-based model using 4 bit (
load_kbit=4
).- For parameters, you can compare
w2=1.0
andw2=35.0
in loss_kwargs. They are two practical parameters to achieve good performance.You can initialize angle in your experiment as follows:
angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf', pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2', load_kbit=4, train_mode=True)
How can I use w2=1.0 and w2=35.0? Did you mean w1=1.0 and w2=35.0?
no, i mean w2=1.0 or w2=35.0.
angle.fit(
...
loss_kwargs={
'w2': 1.0,
})
or
angle.fit(
...
loss_kwargs={
'w2': 35.0,
})
hello, I meet an error "RuntimeError: No GPU found. A GPU is needed for quantization.".Could you tell me your requirements like python-version, pytorch-version, cuda-version and so on?
no, i mean w2=1.0 or w2=35.0.
angle.fit( ... loss_kwargs={ 'w2': 1.0, })
or
angle.fit( ... loss_kwargs={ 'w2': 35.0, })
Thank you so much. Appreciate you replying so quick!!
hello, I meet an error "RuntimeError: No GPU found. A GPU is needed for quantization.".Could you tell me your requirements like python-version, pytorch-version, cuda-version and so on?
hi @rickeyhhh, may I know your GPU version?
below is my environment:
GPU: 3090 Ti CUDA: 12.2
python libraries:
bitsandbytes 0.41.1
torch 2.0.1
torchvision 0.15.2a0
transformers 4.34.0
hello, I meet an error "RuntimeError: No GPU found. A GPU is needed for quantization.".Could you tell me your requirements like python-version, pytorch-version, cuda-version and so on?
hi @rickeyhhh, may I know your GPU version?
below is my environment:
GPU: 3090 Ti CUDA: 12.2
python libraries:
bitsandbytes 0.41.1 torch 2.0.1 torchvision 0.15.2a0 transformers 4.34.0
Thanks for your reply,I have already solved this problem by matching the torch and CUDA versions.
And I met another problem called : ZeroDivisionError: integer division or modulo by zero I solved this problem by using lower version of bitsandbytes(0.39.0).
Hello @SeanLee97
Can you please give some insights on how to fine-tune the model for retrieval?
Hello @SeanLee97
Can you please give some insights on how to fine-tune the model for retrieval?
Hi @mdabedr
You can follow this script to fine-tune your model: https://github.com/SeanLee97/AnglE/blob/angle-bellm/examples/UAE/train.py
There are two steps:
1) you need to prepare your data into jsonl, as follows:
{"text1": "here is text1", "text2": "here is text2", "label": 0/1}
{"text1": "here is text1", "text2": "here is text2", "label": 0/1}
2) start to fine-tune your model, as follows:
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234 train.py \
--train_path your_custom_data.jsonl --save_dir ./ckpts/your-custom-model \
--model_name WhereIsAI/UAE-Large-V1 \
--w2 35 --learning_rate 5e-8 --maxlen 128 \
--epochs 2 \
--batch_size 32 \
--apply_lora 0 \
--save_steps 1000 --seed -1 --gradient_accumulation_steps 4 --fp16 1
Please set a small learning_rate
such as 1e-6 if you fine-tune based on WhereIsAI/UAE-Large-V1
⚠️ If you would like to fine-tune a retrieval-based model, please specify the following argument
--prompt "Represent this sentence for searching relevant passages: {text}"
Given a query Q and a set of all articles, we have to retrieve the subset of articles that can answer a yes-no question is what I meant
@mdabedr Given a query Q and a set of articles {A1, A2, ..., An}. Supposed A1 and Ai are related to Q. The data can be as follows:
{"text1": Q, "text2": A1, "label": 1}
{"text1": Q, "text2": A2, "label": 0}
...
{"text1": Q, "text2": Ai, "label": 1}
...
{"text1": Q, "text2": An, "label": 0}
I see. Got it, thank you so much.
Can you please also give directions on how to test the trained model? The command and the data format?
@mdabedr The test data format is the same as the training. You can evaluate the performance using the following code:
angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pretrained_model_path='your_custom_model_path').cuda()
angle.evaluate(test_ds, device=angle.device)
More training and evaluating detail, you can check this colab: https://colab.research.google.com/drive/1h28jHvv_x-0fZ0tItIMjf8rJGp3GcO5V?usp=sharing#scrollTo=5HYGEJebULjR
Would I need a specific prompt for the test?
Would I need a specific prompt for the test?
@mdabedr Sure, the test should be consistent with the train. You can set the prompt via set_prompt()
, as follows:
angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pretrained_model_path='your_custom_model_path').cuda()
angle.set_prompt(prompt='YOUR PROMPT HERE')
angle.evaluate(test_ds, device=angle.device)
I am getting a BrokenPipe error on the trained model. I trained the model and from the jsonl file for test set I wrote the following
> import json
with open('./valid.jsonl') as f:
data = [json.loads(line) for line in f]
d={}
d["text1"]=[]
d["text2"]=[]
d["label"]=[]
for i in data:
d["text1"].append(i["text1"])
d["text2"].append(i["text2"])
d["label"].append(i["label"])
from datasets import Dataset,DatasetDict
ds_test = Dataset.from_dict(d)
from angle_emb import AnglE, AngleDataTokenizer,Prompts
angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pretrained_model_path='/home/abedrahman/test/ckpts/FT_UAE_Abed').cuda()
angle.set_prompt(prompt=prompt)
# test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length, prompt_template=Prompts.C), num_proc=8) '''with prompts'''
# test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length) num_proc=8) '''no prompts'''
test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length), num_proc=8)
print(angle.evaluate(test_ds, device=angle.device))
the mapping process on the test_ds fails and throws a BrokenPipeError
I am getting a BrokenPipe error on the trained model. I trained the model and from the jsonl file for test set I wrote the following
> import json with open('./valid.jsonl') as f: data = [json.loads(line) for line in f] d={} d["text1"]=[] d["text2"]=[] d["label"]=[] for i in data: d["text1"].append(i["text1"]) d["text2"].append(i["text2"]) d["label"].append(i["label"]) from datasets import Dataset,DatasetDict ds_test = Dataset.from_dict(d) from angle_emb import AnglE, AngleDataTokenizer,Prompts angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pretrained_model_path='/home/abedrahman/test/ckpts/FT_UAE_Abed').cuda() angle.set_prompt(prompt=prompt) # test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length, prompt_template=Prompts.C), num_proc=8) '''with prompts''' # test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length) num_proc=8) '''no prompts''' test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length), num_proc=8) print(angle.evaluate(test_ds, device=angle.device))
the mapping process on the test_ds fails and throws a BrokenPipeError
could you attach the Error screenshot?
I am getting a BrokenPipe error on the trained model. I trained the model and from the jsonl file for test set I wrote the following
> import json with open('./valid.jsonl') as f: data = [json.loads(line) for line in f] d={} d["text1"]=[] d["text2"]=[] d["label"]=[] for i in data: d["text1"].append(i["text1"]) d["text2"].append(i["text2"]) d["label"].append(i["label"]) from datasets import Dataset,DatasetDict ds_test = Dataset.from_dict(d) from angle_emb import AnglE, AngleDataTokenizer,Prompts angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pretrained_model_path='/home/abedrahman/test/ckpts/FT_UAE_Abed').cuda() angle.set_prompt(prompt=prompt) # test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length, prompt_template=Prompts.C), num_proc=8) '''with prompts''' # test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length) num_proc=8) '''no prompts''' test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length), num_proc=8) print(angle.evaluate(test_ds, device=angle.device))
the mapping process on the test_ds fails and throws a BrokenPipeError
could you attach the Error screenshot?
You also need to set a prompt_template here:
test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length, prompt_template=Prompts.C), num_proc=8)
Sure, here they are
@mdabedr Try to set num_proc=0
to see the actual error
Yeah I try just that before you gave me the tip and it stopped. Something with multiprocessing most likely.
At any case, one last question. How do you get the eval method to throw the results and/or labels predicted?
And I just want to say thank you so much for your quick reply. I really appreciate it
Hi @mdabedr , we did not provide a function to predict labels (it seems like a classification task). We only provide a function to obtain the sentence embedding of the input text, i.e., angle.encode(texts)
. We evaluate the model using Spearman's correlation. You can refer to this evaluation function to see how it works: https://github.com/SeanLee97/AnglE/blob/main/angle_emb/angle.py#L807.
If you want to judge two input sentences as similar or dissimilar, you can compute their cosine similarity based on their sentence embeddings. If their similarity is high, you can treat them as a similar pair; otherwise, they are dissimilar. You can set a threshold for this.
--prompt "Represent this sentence for searching relevant passages: {text}"
I am not sure about this part in particular, could you please explain this
@mdabedr , this is an argument of train.py
, see this code: https://github.com/SeanLee97/AnglE/blob/angle-bellm/examples/UAE/train.py#L19
It will be assigned to prompt_template
, see this: https://github.com/SeanLee97/AnglE/blob/angle-bellm/examples/UAE/train.py#L91
I am trying to train my model using LLAMA-v2-nli. I was able to do so with the bert-nli model but when I try to run with LLAMA I get the following error:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
`from angle_emb import AnglE, AngleDataTokenizer angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf', pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2').cuda() train_ds = ds['train'].shuffle().map(AngleDataTokenizer(angle.tokenizer, angle.max_length), num_proc=8) valid_ds = ds['valid'].map(AngleDataTokenizer(angle.tokenizer, angle.max_length), num_proc=8) test_ds = ds['test'].map(AngleDataTokenizer(angle.tokenizer, angle.max_length), num_proc=8)
angle.fit( train_ds=train_ds, valid_ds=test_ds, output_dir='ckpts/sts-b', batch_size=16, epochs=5, learning_rate=2e-5, save_steps=100, eval_steps=1000, warmup_steps=0, gradient_accumulation_steps=1, loss_kwargs={ 'w1': 1.0, 'w2': 1.0, 'w3': 1.0, 'cosine_tau': 20, 'ibn_tau': 20, 'angle_tau': 1.0 }, fp16=True, logging_steps=100 )`
I used the same code for bert (loaded the bert model instead) and it works no issues