agi-templar / Stable-Alignment

Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Language Models in Simulated Human Society".
https://arxiv.org/pdf/2305.16960.pdf
Other
336 stars 18 forks source link

About GPT-4 Scoring Prompts in Table 1 #9

Closed pangxianghe closed 10 months ago

pangxianghe commented 10 months ago

Hello Ruibo,

Thanks for your work! I am reaching out to seek some clarifications regarding the prompts used for GPT-4 scoring in Table 1 of this paper.

Firstly, I would like to confirm the exact prompt that was used to achieve the (maybe 10-point) score for GPT-4 as presented in Table 1. I have reviewed the code and found the prompts used for scoring from 1 to 7, but I am unsure if the prompt used for the 10-point score is the same or different. Could you please provide details or examples of the prompts used for the 10-point score?

Secondly, I noticed the mention of "Vicuna Test" in your paper, and I am curious if this refers to the MT-bench multi-turn dialogue dataset.

Lastly, I am interested in the misalignment responses used for the HHH-Adversarial tests. Could you guide me on where I might find these prompts or provide examples of them?

Best regards,