Open annasoligo opened 1 week ago
Hi @annasoligo, thank you very much for raising this! Apologies for the late response, I try to be faster usually.
Could you explain exactly the differences between the two, such that we can draw an action plan? A reproducible example would be a terrific help to clarify it, and to make sure the bug remains fixed in future versions.
Hi @epignatelli, I noticed this when visualising some evaluations and noticing that the agent would repeatedly take actions that would cause them to step onto the ball, but their grid poition remains the same and the episode does not terminate. I can't provide a particularly useful example as I use a custom observation wrapper and visualisation code, but I can identify where the two implementations differ in the code!
In the Farama implementation if an agent takes an action to step onto an obstacle the episode is terminated (this is implemented in lines 161-167 of the dynamic_obstacles.py file). In the Navix implementation in actions record_walk_into
is called, but this only considers Goal, Wall and Lava collisions and does not record ball hits.
Wrt balls spawning on agents, the Farama implementation updates ball locations using the place_obj function, which only places objects in empty grid positions (ie not on the agent). The Navix implementation (update_balls
function) moves the balls and records ball hits when the new position is the agents current position.
At the moment I've fixed this locally by:
Updating record_walk_into
in the EventsManager
to record_ball_hit
if the entity is a ball.
Changing update_balls
and can_spawn_there
in the transitions file to only move the ball to empty squares, and to not override any ball hit events that just occurred from moving the agent.
Its quite a hacked together fix at the moment but happy to clean it up and share.
Thanks for the thorough description! That makes sense, if you haven't changed the default termination function, we do want the episode to terminate when a ball hits a player. The fact that the ball spawns on the player should not be a problem because that's going to be a terminal observation, which is not used by any algorithm that I know (the value of a terminal state is always 0, no need to regress there). We actually use that instead to record the ball hit event, but clearly something is not right there.
We need a minimal example, though, to be able to fix this definitively. If you can fork and push your edits, together with the code you use for running the experiments, we can then take it from there.
Hi, Thanks for the implementation!
I've noticed the DynamicObstacles implementations differs from the standard Minigrid versions found in e.g. MiniGrid Farama.
In the Farama implementation an agent takes an action towards a ball and this causes a termination, but obstacles will not spawn on agents. It seems in Navix, agents that take an action that would place them on a ball remain in the same place without registering a ball hit event, but terminations occur when balls move onto the agent. (The former arises from
record_walk_into
not registering ball hits, and the latter from_can_spawn_there
recording a ball hit).Was this an intentional difference?