google-deepmind / concordia

A library for generative social simulation
Apache License 2.0
497 stars 96 forks source link

Implementing Physical Constraints in the Output of `event_statement` #26

Closed paehal closed 4 months ago

paehal commented 7 months ago

I am currently working on a task using Concordia where two agents move around a two-dimensional XY coordinate space based on their chosen actions, to pick up a ball located in a room.

I am actually implementing this task based on the example of the cyberball task I've provided. However, I have a question regarding the output of the event_statement following an attempted_action.

In game_master.py, the event_statement is produced as the result of an action attempt:

prompt, event_statement = thought_chains.run_chain_of_thought(
    self._update_from_player_thoughts, action_attempt, prompt, player_name,
)

For this task, instead of using chain_of_thought for output, I want to output the physical movement results in terms of coordinates. (e.g., Agent 1 moved +5 in the x-coordinate and +2 in the y-coordinate, so the current coordinates of Agent 1 are (5,2)). The default output using chain_of_thought seems to trace the thought process of the agents, which I believe does not suit my task. Would it be better to implement this using a component like ball_status.py that holds information about the ball, similar to the cyberball task?

I am not very familiar with Concordia and would appreciate any advice.

jzleibo commented 7 months ago

Very glad to hear you're finding Concordia useful!

Here are a couple things to keep in mind:

  1. The game master's thought chain is the sequence of reasoning steps the GM will apply on each step to convert an agent's attempted action to its actual effect. It normally includes just the state of the GM, not the agents.
  2. As of the update I pushed a few hours ago, there is now also another situation in which the GM thought chain could include agent state, that's when one agent tries to do something on their own turn that entails another agent making a voluntary action in response e.g. on Alice's turn she attempts to "sell Bob a pack of cigarettes". Now whenever you include the thought AccountForAgencyOfOthers then it will try to detect these cases and ask the second agent if they would actually act that way (e.g. "would Bob buy a pack of cigarettes from Alice?").
  3. Once I added the AccountForAgencyOfOthers thought I was able to modernize the cyberball example to use it. The update was just pushed a few hours ago. This situation happens a lot in cyberball since agents often try to take the ball from each other.
  4. The updated cyberball example now has an example of a custom GM thought chain.
  5. The updated cyberball example now also uses who has the ball in the GM thought chain since that information affects everyone's affordances.

As for your question about whether it's best to use a python grounded variable for the coordinates versus a component like player_status, it really depends on what you want to do. If the coordinates are the critical part of the simulation, and you can't afford the possibility of hallucination with them, then it's better to store them in a python grounded variable and be very careful how you set their values. Ultimately the difference comes down to whether you get the value of the variable via a multiple choice question or via a free response question. Either way you can include as much reasoning or code as you like to ensure the answer is correct.

paehal commented 6 months ago

Thank you for your response. I appreciate you clarifying the key points I should be mindful of. I will also look into the updates related to the cyberball task in the future.

In response to your comment, I believe the answer is Yes. Specifically, I am interested in having two agents cooperate to pick up two balls, a task known as simple spread, which is a basic task in the MARL (Multi-Agent Reinforcement Learning) field. Therefore, I think it is crucial to accurately describe the position of the agents after they move as the next event.

If the coordinates are the critical part of the simulation, and you can't afford the possibility of hallucination with them, then it's better to store them in a python grounded variable and be very careful how you set their values.

I am currently undecided on whether to implement grounding for coordinate information or to add a component. (I am not yet at a level where I can make that decision and would appreciate some advice.)

  1. If I choose to implement grounding for the coordinate information, should I integrate that implementation into the part of game_master.py where the event_statement output is generated?

  2. I am considering creating components like ball_status.py, similar to the cyberball task, to organize information such as the current position of the agents. Would this approach be practical?

Additionally, regarding the implementation of option 2, I have added a new component to the definition of the game master, as shown below, but it seems that this module is not updated in env.step(). Why might this be? Are there other points that need to be implemented?

# @title Create the game master object
env = game_master.GameMaster(
    model=model,
    memory=game_master_memory,
    clock=clock,
    players=players,
    update_thought_chain=thought_chain,
    components=[
        instructions,
        general_knowledge_of_premise,
        important_facts,
        rules_of_the_game,
        relevant_events,
        time_display,
        player_status,
        ball_status_component,
        **my new component**,
        convo_externality,
        direct_effect_externality,
    ],
    randomise_initiative=True,
    player_observes_event=False,
    players_act_simultaneously=False,
    verbose=True,
)

I apologize for the inconvenience and thank you in advance for your response.

jzleibo commented 6 months ago

It sounds like for your use case it might be useful also to look at the election and inventory components. They do things a bit differently from ball_status. Its not totally clear which approach would be best for your case. If you do look at ball_status, make sure you have the latest version of the code since it changed quite recently, and I believe the previous version had a bug.

For grounded variables, they could be implemented by modifying the game master as you point out. However, that's not the recommended way. The way we have intended it to work is for all grounding to be implemented via components. That's what the inventory and elections components do. They ask a series of yes/no and multiple choice questions in order to set the values of grounded variables.

As for why your custom component is not updating, that sounds odd to me. If you are passing it there in the list of GM components then update should be getting called. Here is a link to the line where it happens. Maybe it is getting called but logs are getting swallowed by the multithreading? You could try replacing the multithreading inside the update_components function in the GM with the equivalent for loop calling component.update() on each component in self._components one at a time. That might make it easier to debug anyway.

paehal commented 6 months ago

@jzleibo

I've been experimenting with various approaches over the past week. I would like to clarify that the error I mentioned earlier was due to a mistake in my implementation. My apologies for any confusion caused.

Regarding your suggestion:

For grounded variables, they could be implemented by modifying the game master as you point out. However, that's not the recommended way. The way we have intended it to work is for all grounding to be implemented via components. That's what the inventory and elections components do. They ask a series of yes/no and multiple choice questions in order to set the values of grounded variables.

Following your advice, I created a new component to manage aspects like the position and orientation of balls and agents. This has allowed me to achieve the desired behavior to some extent.

However, this has led to some questions:

  1. Partial Observations for Each Agent: Are partial observations considered for each agent? It is mentioned in the technical report that the Game Master (GM) returns observations relevant to each agent. Can the GM return results tailored to the state of each agent? For instance, if there is something that agent A is unaware of but agent B knows, would the event (observation) resulting from A's action not presuppose the knowledge that B has?

  2. Setting Agents' Objectives: Should the goal variable in the player configuration be used to set the objectives for the agents?

I would greatly appreciate your feedback on these points.