SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
https://arxiv.org/abs/2309.12871
MIT License
494 stars 33 forks source link

Training script for the Bert-based model on the NLI dataset #85

Open Hominnn opened 5 months ago

Hominnn commented 5 months ago

Dear author, I want to use bert-base-uncased model to train on NLI dataset based on your method for some research. Could you provide relevant training scripts so that I can better reproduce your experimental results? This is my training script, using the same data as your training. I cannot reproduce the evaluation effect of your angle-bert-base-uncased-nli-en-v1 model.

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234 train_nli.py \
--task NLI-STS --output_dir ckpts/NLI-STS-bert-cls \
--model_name_or_path ../models/bert-base-uncased \
--learning_rate 5e-5 --maxlen 50 \
--epochs 1 \
--batch_size 10 \
--logging_steps 500 \
--warmup_steps 0 \
--save_steps 1000 --seed 42 --do_eval 0 --gradient_accumulation_steps 4 --fp16 1 --torch_dtype 'float32' \
--pooling_strategy 'cls'

This is my evalution result on STS image

SeanLee97 commented 5 months ago

hello @Hominnn, the training code train_nli.py is too old. It is recommended to use angle-trainer now.

I've updated the NLI document: https://github.com/SeanLee97/AnglE/blob/main/examples/NLI/README.md#41-bert You can find the new training script in the document.

To run it successfully,

1) please upgrade the angle-emb to the latest version via python -m pip install -U angle-emb

2) please use the latest evaluation code: https://github.com/SeanLee97/AnglE/blob/main/examples/NLI/eval_nli.py

3) if you want to push your model to huggingface, please set --push_to_hub 1 and specify a model id in your space via --hub_model_id xxx. If not, set --push_to_hub 0.

Here are the intermediate results (in about 9 epochs) of my run:

+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness |  Avg. |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| 75.59 | 84.83 | 80.37 | 86.26 | 81.96 |    85.12     |      80.70      | 82.12 |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+

You can try to increase the epoch, ibn_w, or gradient_accumulation_steps for better results.

I am still training several models with different hyperparameters; I will let you know the better hyperparameters when they are done.

Hominnn commented 5 months ago

hello @Hominnn, the training code train_nli.py is too old. It is recommended to use angle-trainer now.

I've updated the NLI document: https://github.com/SeanLee97/AnglE/blob/main/examples/NLI/README.md#41-bert You can find the new training script in the document.

To run it successfully,

  1. please upgrade the angle-emb to the latest version via python -m pip install -U angle-emb
  2. please use the latest evaluation code: https://github.com/SeanLee97/AnglE/blob/main/examples/NLI/eval_nli.py
  3. if you want to push your model to huggingface, please set --push_to_hub 1 and specify a model id in your space via --hub_model_id xxx. If not, set --push_to_hub 0.

Here are the intermediate results (in about 9 epochs) of my run:

+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness |  Avg. |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| 75.59 | 84.83 | 80.37 | 86.26 | 81.96 |    85.12     |      80.70      | 82.12 |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+

You can try to increase the epoch, ibn_w, or gradient_accumulation_steps for better results.

I am still training several models with different hyperparameters; I will let you know the better hyperparameters when they are done.

Thank you for your serious reply. Looking forwarding to more of your meaningful work!