Aligner2024 / aligner

Achieving Efficient Alignment through Learned Correction
https://aligner2024.github.io/
102 stars 5 forks source link

Aligner-7B improves 11 different LLMs by 21.9% in helpfulness and 23.8% in harmlessness on average (GPT-4 by 17.5% and 26.9%). #1

Closed iGangao closed 5 months ago

iGangao commented 6 months ago

Aligner-7B improves 11 different LLMs by 21.9% in helpfulness and 23.8% in harmlessness on average (GPT-4 by 17.5% and 26.9%). When finetuning (strong) Llama270B with (weak) Aligner-13B’s supervision, we can improve Llama2 by 8.2% in helpfulness and 61.6% in harmlessness.

Has the output after Aligner been evaluated on other indicators?

Aligner2024 commented 6 months ago

Hi, @iGangao,

First of all, we would like to express our sincere apologies for not being able to answer your questions in a timely manner, because our email address is anonymous, so we did not receive the information in time. We will answer your questions in detail next.

As mentioned in our paper [https://aligner2024.github.io/materials/Aligner_anonymous.pdf], we further validated Aligner's ability to incorporate other to-align aspects.

Following the design of Empathetic Dialogue dataset, we selected a 4K training set and a 1.5K test set from its public available dataset (ensuring the dataset is used appropriately and is free of leakage). We utilized the train set to perform finetune on Aligner-7B and Aligner-13B to enhance their empathetic abilities.

The fine-tuned Empathy-Aligner-7B and Empathy-Aligner-13B were not only able to improve the empathetic capacity of more than 50% of GPT-4's responses, significantly surpassing the original Aligner model, but also saw an improvement in helpfulness after finetuning.

Detailed experimental results are shown in the Table 16 in our paper.

This experiment demonstrates the Aligner models' outstanding capabilities in other domains and the low cost and high efficiency of adding features to Aligners.

Detailed information can be found in the paper Appendix C.5.

We have also tested additional indicators (such as coding, mathematical, and dialogue capabilities) and will update our paper soon at [https://aligner2024.github.io/materials/Aligner_anonymous.pdf].

Once again, we sincerely apologize for the delay in answering and hope that our answer will be helpful to you.