epignatelli / navix

Accelerated minigrid environments with JAX
Apache License 2.0
103 stars 7 forks source link

Difference from standard dynamic obstacles implementations #91

Open annasoligo opened 1 week ago

annasoligo commented 1 week ago

Hi, Thanks for the implementation!

I've noticed the DynamicObstacles implementations differs from the standard Minigrid versions found in e.g. MiniGrid Farama.

In the Farama implementation an agent takes an action towards a ball and this causes a termination, but obstacles will not spawn on agents. It seems in Navix, agents that take an action that would place them on a ball remain in the same place without registering a ball hit event, but terminations occur when balls move onto the agent. (The former arises from record_walk_into not registering ball hits, and the latter from _can_spawn_there recording a ball hit).

Was this an intentional difference?

epignatelli commented 4 days ago

Hi @annasoligo, thank you very much for raising this! Apologies for the late response, I try to be faster usually.

Could you explain exactly the differences between the two, such that we can draw an action plan? A reproducible example would be a terrific help to clarify it, and to make sure the bug remains fixed in future versions.

annasoligo commented 3 days ago

Hi @epignatelli, I noticed this when visualising some evaluations and noticing that the agent would repeatedly take actions that would cause them to step onto the ball, but their grid poition remains the same and the episode does not terminate. I can't provide a particularly useful example as I use a custom observation wrapper and visualisation code, but I can identify where the two implementations differ in the code!

In the Farama implementation if an agent takes an action to step onto an obstacle the episode is terminated (this is implemented in lines 161-167 of the dynamic_obstacles.py file). In the Navix implementation in actions record_walk_into is called, but this only considers Goal, Wall and Lava collisions and does not record ball hits.

Wrt balls spawning on agents, the Farama implementation updates ball locations using the place_obj function, which only places objects in empty grid positions (ie not on the agent). The Navix implementation (update_balls function) moves the balls and records ball hits when the new position is the agents current position.

At the moment I've fixed this locally by: Updating record_walk_into in the EventsManager to record_ball_hit if the entity is a ball. Changing update_balls and can_spawn_there in the transitions file to only move the ball to empty squares, and to not override any ball hit events that just occurred from moving the agent. Its quite a hacked together fix at the moment but happy to clean it up and share.

epignatelli commented 2 days ago

Thanks for the thorough description! That makes sense, if you haven't changed the default termination function, we do want the episode to terminate when a ball hits a player. The fact that the ball spawns on the player should not be a problem because that's going to be a terminal observation, which is not used by any algorithm that I know (the value of a terminal state is always 0, no need to regress there). We actually use that instead to record the ball hit event, but clearly something is not right there.

We need a minimal example, though, to be able to fix this definitively. If you can fork and push your edits, together with the code you use for running the experiments, we can then take it from there.