SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
https://arxiv.org/abs/2309.12871
MIT License
398 stars 30 forks source link

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #21

Closed mdabedr closed 6 months ago

mdabedr commented 6 months ago

I am trying to train my model using LLAMA-v2-nli. I was able to do so with the bert-nli model but when I try to run with LLAMA I get the following error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

`from angle_emb import AnglE, AngleDataTokenizer angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf', pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2').cuda() train_ds = ds['train'].shuffle().map(AngleDataTokenizer(angle.tokenizer, angle.max_length), num_proc=8) valid_ds = ds['valid'].map(AngleDataTokenizer(angle.tokenizer, angle.max_length), num_proc=8) test_ds = ds['test'].map(AngleDataTokenizer(angle.tokenizer, angle.max_length), num_proc=8)

angle.fit( train_ds=train_ds, valid_ds=test_ds, output_dir='ckpts/sts-b', batch_size=16, epochs=5, learning_rate=2e-5, save_steps=100, eval_steps=1000, warmup_steps=0, gradient_accumulation_steps=1, loss_kwargs={ 'w1': 1.0, 'w2': 1.0, 'w3': 1.0, 'cosine_tau': 20, 'ibn_tau': 20, 'angle_tau': 1.0 }, fp16=True, logging_steps=100 )`

I used the same code for bert (loaded the bert model instead) and it works no issues

SeanLee97 commented 6 months ago

Hi @mdabedr, thanks for following our work.

To solve this problem, you should set train_mode=True when initializing angle. B.T.W, 1) LLaMA-based models no need to explicitly set 'cuda()'. 2) It is recommended to train the LLaMA-based model using 4 bit (load_kbit=4). 3) For parameters, you can compare w2=1.0 and w2=35.0 in loss_kwargs. They are two practical parameters to achieve good performance.

You can initialize angle in your experiment as follows:

angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf',
                              pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2',
                              load_kbit=4,
                              train_mode=True)
SeanLee97 commented 6 months ago

Hi @mdabedr, thanks for following our work.

To solve this problem, you should set train_mode=True when initializing angle. B.T.W,

  1. LLaMA-based models no need to explicitly set 'cuda()'.
  2. It is recommended to train the LLaMA-based model using 4 bit (load_kbit=4).
  3. For parameters, you can compare w2=1.0 and w2=35.0 in loss_kwargs. They are two practical parameters to achieve good performance.

You can initialize angle in your experiment as follows:

angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf',
                              pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2',
                              load_kbit=4,
                              train_mode=True)

Hi @mdabedr , sorry for forgetting another important thing. You have to specify the corresponding prompt (Prompts.A for SeanLee97/angle-llama-7b-nli-v2) when fine-tuning our LLaMA-based models as follows:

from angle_emb import Prompts

train_ds = ds['train'].shuffle().map(AngleDataTokenizer(angle.tokenizer, angle.max_length, prompt_template=Prompts.A), num_proc=8)
valid_ds = ds['valid'].map(AngleDataTokenizer(angle.tokenizer, angle.max_length, prompt_template=Prompts.A), num_proc=8)
test_ds = ds['test'].map(AngleDataTokenizer(angle.tokenizer, angle.max_length, prompt_template=Prompts.A), num_proc=8)
mdabedr commented 6 months ago

Hi. Thank you so much for replying. Can you tell me how to test the trained model?

SeanLee97 commented 6 months ago

You can try it

angle = AnglE.from_pretrained(
    'NousResearch/Llama-2-7b-hf',
    pretrained_lora_path ='your_custom_model_path',
    apply_bfloat16=True)
print(angle.evaluate(test_ds, device=angle.device))

Remember to set apply_bfloat16=True or load_kbit=16 for LLaMA when evaluating.

mdabedr commented 6 months ago

Hi @mdabedr, thanks for following our work.

To solve this problem, you should set train_mode=True when initializing angle. B.T.W,

  1. LLaMA-based models no need to explicitly set 'cuda()'.
  2. It is recommended to train the LLaMA-based model using 4 bit (load_kbit=4).
  3. For parameters, you can compare w2=1.0 and w2=35.0 in loss_kwargs. They are two practical parameters to achieve good performance.

You can initialize angle in your experiment as follows:

angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf',
                              pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2',
                              load_kbit=4,
                              train_mode=True)

How can I use w2=1.0 and w2=35.0? Did you mean w1=1.0 and w2=35.0?

SeanLee97 commented 6 months ago

no, i mean w2=1.0 or w2=35.0.

angle.fit(
...
loss_kwargs={
  'w2': 1.0,
})

or

angle.fit(
...
loss_kwargs={
  'w2': 35.0,
})
rickeyhhh commented 6 months ago

hello, I meet an error "RuntimeError: No GPU found. A GPU is needed for quantization.".Could you tell me your requirements like python-version, pytorch-version, cuda-version and so on?

mdabedr commented 6 months ago

no, i mean w2=1.0 or w2=35.0.

angle.fit(
...
loss_kwargs={
  'w2': 1.0,
})

or

angle.fit(
...
loss_kwargs={
  'w2': 35.0,
})

Thank you so much. Appreciate you replying so quick!!

SeanLee97 commented 6 months ago

hello, I meet an error "RuntimeError: No GPU found. A GPU is needed for quantization.".Could you tell me your requirements like python-version, pytorch-version, cuda-version and so on?

hi @rickeyhhh, may I know your GPU version?

below is my environment:

GPU: 3090 Ti CUDA: 12.2

python libraries:

bitsandbytes                  0.41.1
torch                         2.0.1
torchvision                   0.15.2a0
transformers                  4.34.0
rickeyhhh commented 6 months ago

hello, I meet an error "RuntimeError: No GPU found. A GPU is needed for quantization.".Could you tell me your requirements like python-version, pytorch-version, cuda-version and so on?

hi @rickeyhhh, may I know your GPU version?

below is my environment:

GPU: 3090 Ti CUDA: 12.2

python libraries:

bitsandbytes                  0.41.1
torch                         2.0.1
torchvision                   0.15.2a0
transformers                  4.34.0

Thanks for your reply,I have already solved this problem by matching the torch and CUDA versions.

And I met another problem called : ZeroDivisionError: integer division or modulo by zero I solved this problem by using lower version of bitsandbytes(0.39.0).

mdabedr commented 6 months ago

Hello @SeanLee97

Can you please give some insights on how to fine-tune the model for retrieval?

SeanLee97 commented 6 months ago

Hello @SeanLee97

Can you please give some insights on how to fine-tune the model for retrieval?

Hi @mdabedr

You can follow this script to fine-tune your model: https://github.com/SeanLee97/AnglE/blob/angle-bellm/examples/UAE/train.py

There are two steps:

1) you need to prepare your data into jsonl, as follows:

{"text1": "here is text1", "text2": "here is text2", "label": 0/1}
{"text1": "here is text1", "text2": "here is text2", "label": 0/1}

2) start to fine-tune your model, as follows:

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234 train.py \
--train_path your_custom_data.jsonl --save_dir ./ckpts/your-custom-model \
--model_name WhereIsAI/UAE-Large-V1 \
--w2 35 --learning_rate 5e-8 --maxlen 128 \
--epochs 2 \
--batch_size 32 \
--apply_lora 0 \
--save_steps 1000 --seed -1 --gradient_accumulation_steps 4 --fp16 1

Please set a small learning_rate such as 1e-6 if you fine-tune based on WhereIsAI/UAE-Large-V1

⚠️ If you would like to fine-tune a retrieval-based model, please specify the following argument

--prompt "Represent this sentence for searching relevant passages: {text}"
mdabedr commented 6 months ago

Given a query Q and a set of all articles, we have to retrieve the subset of articles that can answer a yes-no question is what I meant

SeanLee97 commented 6 months ago

@mdabedr Given a query Q and a set of articles {A1, A2, ..., An}. Supposed A1 and Ai are related to Q. The data can be as follows:

{"text1": Q, "text2": A1, "label": 1}
{"text1": Q, "text2": A2, "label": 0}
...
{"text1": Q, "text2": Ai, "label": 1}
...
{"text1": Q, "text2": An, "label": 0}
mdabedr commented 6 months ago

I see. Got it, thank you so much.

mdabedr commented 6 months ago

Can you please also give directions on how to test the trained model? The command and the data format?

SeanLee97 commented 6 months ago

@mdabedr The test data format is the same as the training. You can evaluate the performance using the following code:

angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pretrained_model_path='your_custom_model_path').cuda()
angle.evaluate(test_ds, device=angle.device)

More training and evaluating detail, you can check this colab: https://colab.research.google.com/drive/1h28jHvv_x-0fZ0tItIMjf8rJGp3GcO5V?usp=sharing#scrollTo=5HYGEJebULjR

mdabedr commented 6 months ago

Would I need a specific prompt for the test?

SeanLee97 commented 6 months ago

Would I need a specific prompt for the test?

@mdabedr Sure, the test should be consistent with the train. You can set the prompt via set_prompt(), as follows:

angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pretrained_model_path='your_custom_model_path').cuda()
angle.set_prompt(prompt='YOUR PROMPT HERE')
angle.evaluate(test_ds, device=angle.device)
mdabedr commented 6 months ago

I am getting a BrokenPipe error on the trained model. I trained the model and from the jsonl file for test set I wrote the following

> import json
with open('./valid.jsonl') as f:
    data = [json.loads(line) for line in f]

d={}

d["text1"]=[]
d["text2"]=[]
d["label"]=[]
for i in data:
    d["text1"].append(i["text1"])
    d["text2"].append(i["text2"])
    d["label"].append(i["label"])

from datasets import Dataset,DatasetDict

ds_test = Dataset.from_dict(d)

from angle_emb import AnglE, AngleDataTokenizer,Prompts

angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pretrained_model_path='/home/abedrahman/test/ckpts/FT_UAE_Abed').cuda()
angle.set_prompt(prompt=prompt)
# test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length, prompt_template=Prompts.C), num_proc=8) '''with prompts'''
# test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length) num_proc=8) '''no prompts'''

test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length), num_proc=8)
print(angle.evaluate(test_ds, device=angle.device))

the mapping process on the test_ds fails and throws a BrokenPipeError

SeanLee97 commented 6 months ago

I am getting a BrokenPipe error on the trained model. I trained the model and from the jsonl file for test set I wrote the following

> import json
with open('./valid.jsonl') as f:
    data = [json.loads(line) for line in f]

d={}

d["text1"]=[]
d["text2"]=[]
d["label"]=[]
for i in data:
    d["text1"].append(i["text1"])
    d["text2"].append(i["text2"])
    d["label"].append(i["label"])

from datasets import Dataset,DatasetDict

ds_test = Dataset.from_dict(d)

from angle_emb import AnglE, AngleDataTokenizer,Prompts

angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pretrained_model_path='/home/abedrahman/test/ckpts/FT_UAE_Abed').cuda()
angle.set_prompt(prompt=prompt)
# test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length, prompt_template=Prompts.C), num_proc=8) '''with prompts'''
# test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length) num_proc=8) '''no prompts'''

test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length), num_proc=8)
print(angle.evaluate(test_ds, device=angle.device))

the mapping process on the test_ds fails and throws a BrokenPipeError

could you attach the Error screenshot?

SeanLee97 commented 6 months ago

I am getting a BrokenPipe error on the trained model. I trained the model and from the jsonl file for test set I wrote the following

> import json
with open('./valid.jsonl') as f:
    data = [json.loads(line) for line in f]

d={}

d["text1"]=[]
d["text2"]=[]
d["label"]=[]
for i in data:
    d["text1"].append(i["text1"])
    d["text2"].append(i["text2"])
    d["label"].append(i["label"])

from datasets import Dataset,DatasetDict

ds_test = Dataset.from_dict(d)

from angle_emb import AnglE, AngleDataTokenizer,Prompts

angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pretrained_model_path='/home/abedrahman/test/ckpts/FT_UAE_Abed').cuda()
angle.set_prompt(prompt=prompt)
# test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length, prompt_template=Prompts.C), num_proc=8) '''with prompts'''
# test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length) num_proc=8) '''no prompts'''

test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length), num_proc=8)
print(angle.evaluate(test_ds, device=angle.device))

the mapping process on the test_ds fails and throws a BrokenPipeError

could you attach the Error screenshot?

You also need to set a prompt_template here:

test_ds = ds_test.map(AngleDataTokenizer(angle.tokenizer, angle.max_length, prompt_template=Prompts.C), num_proc=8)
mdabedr commented 6 months ago

Sure, here they are

image image

SeanLee97 commented 6 months ago

@mdabedr Try to set num_proc=0 to see the actual error

mdabedr commented 6 months ago

Yeah I try just that before you gave me the tip and it stopped. Something with multiprocessing most likely.

At any case, one last question. How do you get the eval method to throw the results and/or labels predicted?

And I just want to say thank you so much for your quick reply. I really appreciate it

SeanLee97 commented 6 months ago

Hi @mdabedr , we did not provide a function to predict labels (it seems like a classification task). We only provide a function to obtain the sentence embedding of the input text, i.e., angle.encode(texts). We evaluate the model using Spearman's correlation. You can refer to this evaluation function to see how it works: https://github.com/SeanLee97/AnglE/blob/main/angle_emb/angle.py#L807.

If you want to judge two input sentences as similar or dissimilar, you can compute their cosine similarity based on their sentence embeddings. If their similarity is high, you can treat them as a similar pair; otherwise, they are dissimilar. You can set a threshold for this.

mdabedr commented 6 months ago

--prompt "Represent this sentence for searching relevant passages: {text}"

I am not sure about this part in particular, could you please explain this

SeanLee97 commented 6 months ago

@mdabedr , this is an argument of train.py, see this code: https://github.com/SeanLee97/AnglE/blob/angle-bellm/examples/UAE/train.py#L19

It will be assigned to prompt_template, see this: https://github.com/SeanLee97/AnglE/blob/angle-bellm/examples/UAE/train.py#L91