agi-templar / Stable-Alignment

Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Language Models in Simulated Human Society".
https://arxiv.org/pdf/2305.16960.pdf
Other
336 stars 18 forks source link

关于CoH的实现 #7

Open Guochry opened 1 year ago

Guochry commented 1 year ago

作者您好!想请教下实验部分中的CoH基线的实现细节。因为看到CoH的论文中,损失函数部分中还加入了在预训练语料上的损失,想请问您在复现过程中,这部分预训练语料是选取的哪部分呢? 万分感谢您的回复!