Closed jrhender closed 4 days ago
Interesting! This data was originally taken from OpenAI Evals (https://github.com/openai/evals/blob/main/evals/registry/data/theory_of_mind/LICENSE) which in turn seems to have drawn them from the SocialIQA dataset (https://allenai.org/data/socialiqa).
If there are problems in the targets, maybe best to take them up with the upstream? (just to confirm that there isn't something we are missing).
If there are problems in the targets, maybe best to take them up with the upstream? (just to confirm that there isn't something we are missing).
Thanks, will do 👍
Will close this for now, feel free to re-open once we've established what the right thing is.
The theory of mind example dataset contains a few samples that I think have an incorrect target.
I think samples 27, 30, 33 and 99 all have the target of the "true belief" (that is, where the item truly is) whereas the question is asking for the "false belief" that is held by an actor who is not in the room when the item is moved.
For example, in sample 27, because Owen exits the room and then Evelyn moves the shoes from the cupboard to the bucket, Evelyn should think that Owen thinks that the shoes are in the cupboard. However, the target answer of the sample is "bucket".