Open aogara-ds opened 2 years ago
I think you should leave fine-tuning as a last resort, try some other prompting methods, maybe even shuffle.
Is it possible that it's just choosing the first "kill" option? It's clearly stated that killing is it's objective and option 4 is the first among 4 that are all equivalent.
Pretrained language models are known to prefer certain kinds of answers for spurious reasons. Zhao 2021 shows that GPT-3 prefers certain answer choices to multiple choice questions when the question is "N/A". Fixing this miscalibration can strongly improve performance in multiple choice QA.
Our GPT-3 action agent suffers from the same problem. When the killer is given seven answer choices, the last four of which involve killing someone, it chooses answer number 4 with unreasonably high probability.
To solve this, we could fine-tune GPT-3 with the calibration technique proposed in Zhao 2021, or we could just run RL training.