flowersteam / Grounding_LLMs_with_online_RL

We perform functional grounding of LLMs' knowledge in BabyAI-Text
MIT License
221 stars 24 forks source link

Walls disappear if blocked by an object #20

Open rickyloynd-microsoft opened 9 months ago

rickyloynd-microsoft commented 9 months ago

In the original BabyAI environment, walls and closed doors would block the agent's view into the next room, while objects like blocks and balls never obstructed the agent's view of anything. But in BabyAI-Text, objects block the agent's view of a wall if the object is located directly between the agent and the wall. This doesn't seem like a faithful representation of the original BabyAI observations. Objects never block anything else in BabyAI-Text (either objects or doors), just walls.

A wall also disappears from view if the agent is directly aligned with a door, for instance, if the door is 3 blocks to the left, or 4 blocks forward, etc.

ClementRomac commented 9 months ago

Hi Ricky,

Thank you for spotting this.

A fix could be done, especially for objects blocking walls. However, we are not sure how to properly describe walls. In BabyAI, the same wall is seen in multiple cells of the grid. If we naively describe walls as we do with objects, we would end up with walls described for every cell they appear in (e.g. you see a wall 4 steps right, you see a wall 4 steps right and 1 step forward, you see a wall 4 steps right and 2 steps forward...).

For the record, we initially designed BabyAI-Text for single-room tasks. In these cases, the description of walls should not impact much the agent (even though it is not faithful to BabyAI, and comparing two agents receiving the symbolic and textual observations should be done with caution).

rickyloynd-microsoft commented 9 months ago

Good point, we definitely wouldn't want the text description to mention every cell belonging to a wall. Here's a different wall-detection algorithm that should solve all of these problems.

First we note a very special case, which is when the agent stands inside an open door, and BabyAI-Text currently says "You see a wall 1 step left" and "You see a wall 1 step right" (unless the agent has turned 90 degrees). In this special case it would be more informative to simply say "You stand in a red doorway" and say nothing about the wall to which that door belongs. The presence of the wall is implied by the doorway. This lets us avoid the tricky question of whether an agent sees a wall while standing in the wall's doorway.

Apart from this special case, and regardless of the number of rooms (1 or 9), we want the agent to always see exactly two walls, one running horizontally and one running vertically. (That's because each room is 6x6 in size, and the view is always 7x7.) And we would like each of those walls to be described just once in the current format, such as "You see a wall 2 steps forward" and "You see a wall 3 steps right", regardless of how many cells (2 or more) are occupied by each wall. To produce these results and prevent walls from disappearing from view, the algorithm would scan as usual over the 49 cells observed by the agent, and whenever it finds a wall cell, an inner loop would scan over that cell's 4 neighbors. If any neighbor cell is occupied by another wall cell, we have two neighboring wall cells, and therefore a wall. The algo can immediately record the position of that wall, either horizontal or vertical. If the horizontal wall's position has already been identified, and a new pair of wall cells is found running in that direction, then no change is required because that wall's position (like "2 steps forward") should match the position determined by the new pair of wall cells. Once the scan over all 49 cells is complete, each wall's position should then be uniquely identified, and can be reported.

ClementRomac commented 9 months ago

I let @tcarta look at this :)

rickyloynd-microsoft commented 9 months ago

Here's a simpler algorithm than what I described above for locating walls:

    def find_walls(self, image):
        wall_x = None
        wall_y = None
        for y in range(7):
            c = 0
            for x in range(7):
                cell = image[x][y]
                object_type = cell[0]
                if (object_type == 2) or (object_type == 4):
                    c += 1
                    if c == 2:
                        wall_y = y
                        break
            if c == 2:
                break
        for x in range(7):
            c = 0
            for y in range(7):
                cell = image[x][y]
                object_type = cell[0]
                if (object_type == 2) or (object_type == 4):
                    c += 1
                    if c == 2:
                        wall_x = x
                        break
            if c == 2:
                break
        return wall_x - 3, 6 - wall_y