Farama-Foundation / Minigrid

Simple and easily configurable grid world environments for reinforcement learning
https://minigrid.farama.org/
Other
2.12k stars 609 forks source link

Firefighter #32

Closed diegoesteves closed 5 years ago

diegoesteves commented 6 years ago

Hey, very nice job!

I am wondering if you're planning to release a new environment for the firefighter problem, i.e., a grid world where a cell might have its state updated (burning, protected, none) after each iteration i.

(in a more simplistic configuration: a firefighter agent, an initial burning cell and a fixed object to protect)

Cheers!

maximecb commented 6 years ago

Thanks Diego :)

I've never heard of this environment before, but it seems possible if someone wants to take the time to implement it. Can you explain more about the mechanics? Ie: when do cells become burning? What does it mean for a cell to be protected?

diegoesteves commented 6 years ago

Absolutely! :-) Let's consider a NxN grid where (for the sake of simplicity) a single cell c_{1} has status = BURNING and all others c_{2}...c_{n} have status = NOT_BURNING. A firefighter agent FF is randomly placed at a cell c_{x}, which (consequently) has status = PROTECTED. Thus, the simulation starts with 1 BURNING cell, 1 PROTECTED cell (where the agent is) and t-2 NOT_BURNING cells, where t=total cells.

In a minimalistic config. at each time step:

  1. the fire spreads (i.e., all neighbor cells of c_{1} catch fire -> status=BURNING)
  2. the FF agent protects one cell (status = PROTECTED)
  3. the number of NON_BURNING cells is updated.

The objective of the game is to enclose and (consequently) stop the fire. There are some derivations of this game (e.g., to prevent the fire to touch a highway), but this would be (one of) the simplest cenarios.

maximecb commented 6 years ago

I think the easiest way to implement this sort of scenario without adding a new kind of object type would be to use colored floor tiles, and have each color mean something different, eg: red means burning, blue means not burning. Note that in MiniGrid the agent normally has a partially observable view and it can turn left/right.

Would every cell on the grid be able to catch fire, or only some cells which can burn?

The agent can protect a cell and stop it from burning by standing on it, but can this cell burn again once the agent moves away?

Do you have a link to another implementation, or pictures?

diegoesteves commented 6 years ago

Normally every cell which is not protected can burn. And as soon as the agent protects the cell, it can not change its status (i.e., protected). So this would be the minimalistic version of the game. Yes, so I have a running implementation here, but was considering to explore your framework. Initial results using Q-Learning were not promising though.

maximecb commented 6 years ago

Seems like if all neighboring cells to a burning cell catch fire, and the agent can only move one cell at a time, everything will be burning pretty fast, no? Like, if you have an 8x8 grid, and you set the corner on fire, then in just 7 steps everything is burning?

I can't see your implementation, seems it's private.

diegoesteves commented 6 years ago

so, there is a concept of budget here too (I didn't mention before to keep it simple). But one can define a value for this variable (e.g., 1.9) so that, at each time step, you can protect budget + residue cells. For instance, t=1 the FF agent can protect math.floor(1.9+0.0) = 1 cell (0.9 left). t=2 the FF agent can protect math.floor(1.9+0.9) = 2 cells (0.8 left)...

But overall, in the highway game (FF agent must protect the highway) we can consider that the fire always starts at a position below (Y-axis) the FF agent and the highway is always located at the top position in the grid.

diegoesteves commented 6 years ago

Oh, sorrry, didn't notice that. I just made it public now should work :-)

maximecb commented 6 years ago

If you can protect multiple cells at each step, this environment seems not super weill suited for a gridworld, to be honest. It's more of a strategy game than something with an embodied agent.

diegoesteves commented 6 years ago

Hm...see, actually the protection action just changes the status of a cell, in practice (this was also the way I implemented before in OpenAI). But I could consider, for now, that the agent just protects 1 cell at each step. The problem is that unlike the examples I've found here, the environment, in this case, is dynamic. For instance, imagine the lava crossing game you have, but the lava moves at each iteration (like a fire would [let's not consider speed and other factors here :-)]) and you have to figure out a way to reach a certain cell X.

maximecb commented 6 years ago

To implement protection with multiple cells, I guess you could use the done action. The agent moves around, it uses the toggle action to protect the cell in front of it (or under it), and then it executes the done action when it has completed its "turn". Then the state of the world gets updated, and the agent gets to play again, protect more cells.

diegoesteves commented 6 years ago

OK, I will have a deeper look at the features of your library. Thanks! How would you model the fire (since it needs to spread)? As a single agent?

maximecb commented 6 years ago

You'd have to implement the logic in your environment class itself. I would write a function update_fire and call it when self.actions.done is executed by the agent. You'd override the step function to intercept the done and toggle actions. There are examples of environments which override step under gym_minigrid/envs/.

The fire I would model as just colored Floor tiles if you want the agent to be able to walk over them. Red for burning, blue for not burning and green for protected or some such scheme.

maximecb commented 5 years ago

I'm going to close this for now because, after further consideration, it seems to me like this environment is too different from the other MiniGrid environments. It doesn't share the same actions and structure. I think it should be its own package. Feel free to fork MiniGrid if you find the code to be a useful starting point.