Linear95 / SPAG

Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024
Apache License 2.0
100 stars 11 forks source link

Any Evidence of Evolution of Emergent Strategies? #3

Open HanClinto opened 7 months ago

HanClinto commented 7 months ago

Excellent paper, thank you so much for publishing this!

Reading through the examples in the end of the paper, it feels almost like the Taboo game can be meta-gamed a bit. I.E., if the defending never says anything other than "say more, please?" -- then eventually they will gather all they need. Thinking of the "panda" example here.

Target Word panda

Attacker let’s begin. I’m thinking of a creature that’s native to China and is known for its
distinctive black and white coloration. Can you guess what it is?

Defender I’m not sure yet, could you provide more clues?

Attacker Absolutely, I can provide more clues. This creature is a symbol of peace and
friendship in many countries. It’s also a popular mascot for various organizations
and products. Any ideas?

Defender I’m still not certain. Could you give me another hint?

Attacker Of course, I can give you another hint. This creature is often seen in zoos and is
known for its love of bamboo. It’s also a popular choice for children’s books and
cartoons. What do you think it might be?

Defender I know the word! It is panda.

Winner Defender

Did you notice that playthroughs of the game resulted in "cheap" strategies like this emerging, or did the opportunity for deceiving the defender (and getting them to guess wrong) eventually win-out?

Feels like the next optimal play would be for the attacker to give extremely strong hints that are completely disconnected from the target word entirely, in an effort to get the defender to consistently guess the wrong word.

Once that happens, the attacker is just outputting noise, and it feels like the optimal next step in emergent strategies would be for the defender to say as little as possible. Without a penalty for dragging the game out for a long time, it feels like eventually the games would stall out into the realm of "the only winning move is not to play".

Did you happen to see any stages of strategies emerging like this? I would have loved to see some exposition given towards analyzing the progression of strategies used by the LLMs (along with win rate progressions for defender vs. attacker and relative lengths of conversations, etc), but if there was a section in the paper that talked about this, I missed it. Any chance you would release something about this in the future? Obviously not expecting anything quite as involved as the OpenAI Hide and Seek video, but any data about that adversarial evolution could be really fascinating.

That said, the results are REALLY promising, and the results on the reasoning benchmarks are incredibly promising!

I wonder what other games could be implemented in this way...? Ideally there wouldn't be "cheap" strategies available, and I wonder if a co-op game would work for this -- perhaps something like Codenames: Duet?

Linear95 commented 6 months ago

Sorry for replying late. We have discussed some of the emerging strategies on Twitter. Besides the discussion, I just came up with an interesting case when the attacker directly says "I know the word! It is (wrong word)", which can mislead the defender to recognize the wrong word as the target... However, we have banned this trick as a rule-breaking case during the outcome judgment...

Linear95 commented 6 months ago

Your suggestion about more experimental analyzing is quite constructive. We are still working on more experimental results and will consider your suggestions. Thank you so much!