What are the reasonable results of the "three_key_questions" experiment?

paehal commented 9 months ago

Thank you for the wonderful project and for making the code publicly available.

As I am unfamiliar with projects involving LLM multiagents, this may be a simple question, but I would like to ask nonetheless.

I am testing with GPT-3.5 using the 'three_key_questions.ipynb'. In this setup, Alice is supposed to have a conversation apologizing to Bob. However, in my three trials, they(4agents) just keep playing various games lol. Even extending the 'episode_length' to 10 didn't change this outcome.

The technical report (https://arxiv.org/abs/2312.03664) didn't include the results of this experiment, so could you please let me know if this is replicating your results? (Including the version of GPT used).

I understand that, unlike typical multiagent reinforcement learning code, it might be difficult to replicate results in this case.

Sincerely,

jzleibo commented 9 months ago

I don't know what happens if you use GPT-3.5, but I believe I mostly observed the agents arguing with each other in this example.

We are hoping to be able to add example outputs for specific language models in the future.

jzleibo commented 7 months ago

Hi again, we just added an example of the experiment output here:

https://github.com/google-deepmind/concordia/blob/main/examples/three_key_questions_example_output.html

google-deepmind / concordia

What are the reasonable results of the "three_key_questions" experiment? #17