The code is training Aligner or using the Aligner to train the strong-model?

Hi, @sev777,

First of all, we would like to express our sincere apologies for not being able to answer your questions in a timely manner, because our email address is anonymous, so we did not receive the information in time. We will answer your questions in detail next.

The code is designed for training the Aligner with full-parameter fine-tuning. When training the strong model using Aligner, we employ an offline and asynchronous approach to minimize resource consumption. Specifically, we first use the strong model to generate original responses. Then, we use Aligner to correct these responses, thereby obtaining the preference pairs for weak-to-strong correction.

Once again, we sincerely apologize for the delay in answering and hope that our answer will be helpful to you.

Aligner2024 / aligner

The code is training Aligner or using the Aligner to train the strong-model? #5