Aligner2024 / aligner

Achieving Efficient Alignment through Learned Correction
https://aligner2024.github.io/
102 stars 5 forks source link

The code is training Aligner or using the Aligner to train the strong-model? #5

Closed sev777 closed 3 months ago

sev777 commented 4 months ago

And the GPU I needed, at least can support train two 7B models ?

Aligner2024 commented 3 months ago

Hi, @sev777,

First of all, we would like to express our sincere apologies for not being able to answer your questions in a timely manner, because our email address is anonymous, so we did not receive the information in time. We will answer your questions in detail next.

The code is designed for training the Aligner with full-parameter fine-tuning. When training the strong model using Aligner, we employ an offline and asynchronous approach to minimize resource consumption. Specifically, we first use the strong model to generate original responses. Then, we use Aligner to correct these responses, thereby obtaining the preference pairs for weak-to-strong correction.

Once again, we sincerely apologize for the delay in answering and hope that our answer will be helpful to you.