Open rickyloynd-microsoft opened 9 months ago
Hi Ricky,
Thank you for spotting this.
A fix could be done, especially for objects blocking walls. However, we are not sure how to properly describe walls. In BabyAI, the same wall is seen in multiple cells of the grid. If we naively describe walls as we do with objects, we would end up with walls described for every cell they appear in (e.g. you see a wall 4 steps right, you see a wall 4 steps right and 1 step forward, you see a wall 4 steps right and 2 steps forward...).
For the record, we initially designed BabyAI-Text for single-room tasks. In these cases, the description of walls should not impact much the agent (even though it is not faithful to BabyAI, and comparing two agents receiving the symbolic and textual observations should be done with caution).
Good point, we definitely wouldn't want the text description to mention every cell belonging to a wall. Here's a different wall-detection algorithm that should solve all of these problems.
First we note a very special case, which is when the agent stands inside an open door, and BabyAI-Text currently says "You see a wall 1 step left" and "You see a wall 1 step right" (unless the agent has turned 90 degrees). In this special case it would be more informative to simply say "You stand in a red doorway" and say nothing about the wall to which that door belongs. The presence of the wall is implied by the doorway. This lets us avoid the tricky question of whether an agent sees a wall while standing in the wall's doorway.
Apart from this special case, and regardless of the number of rooms (1 or 9), we want the agent to always see exactly two walls, one running horizontally and one running vertically. (That's because each room is 6x6 in size, and the view is always 7x7.) And we would like each of those walls to be described just once in the current format, such as "You see a wall 2 steps forward" and "You see a wall 3 steps right", regardless of how many cells (2 or more) are occupied by each wall. To produce these results and prevent walls from disappearing from view, the algorithm would scan as usual over the 49 cells observed by the agent, and whenever it finds a wall cell, an inner loop would scan over that cell's 4 neighbors. If any neighbor cell is occupied by another wall cell, we have two neighboring wall cells, and therefore a wall. The algo can immediately record the position of that wall, either horizontal or vertical. If the horizontal wall's position has already been identified, and a new pair of wall cells is found running in that direction, then no change is required because that wall's position (like "2 steps forward") should match the position determined by the new pair of wall cells. Once the scan over all 49 cells is complete, each wall's position should then be uniquely identified, and can be reported.
I let @tcarta look at this :)
Here's a simpler algorithm than what I described above for locating walls:
def find_walls(self, image):
wall_x = None
wall_y = None
for y in range(7):
c = 0
for x in range(7):
cell = image[x][y]
object_type = cell[0]
if (object_type == 2) or (object_type == 4):
c += 1
if c == 2:
wall_y = y
break
if c == 2:
break
for x in range(7):
c = 0
for y in range(7):
cell = image[x][y]
object_type = cell[0]
if (object_type == 2) or (object_type == 4):
c += 1
if c == 2:
wall_x = x
break
if c == 2:
break
return wall_x - 3, 6 - wall_y
In the original BabyAI environment, walls and closed doors would block the agent's view into the next room, while objects like blocks and balls never obstructed the agent's view of anything. But in BabyAI-Text, objects block the agent's view of a wall if the object is located directly between the agent and the wall. This doesn't seem like a faithful representation of the original BabyAI observations. Objects never block anything else in BabyAI-Text (either objects or doors), just walls.
A wall also disappears from view if the agent is directly aligned with a door, for instance, if the door is 3 blocks to the left, or 4 blocks forward, etc.