askforalfred / alfred

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
MIT License
360 stars 77 forks source link

Irreversible navigation actions? #98

Closed ikb-a closed 2 years ago

ikb-a commented 2 years ago

While trying to collect panoramic observations, I noticed something quite odd -- it appears to be possible for navigation actions (specifically Look up/Look down) to be irreversible? More specifically, when executing the expert trajectory for look_at_obj_in_light-Box-None-FloorLamp-212/trial_T20190908_193427_340509, there appears to be a point in the trajectory, while the agent is carrying a box, that the agent can move forward into a position, can spin in place, but if the agent looks down, then they can't look up again? Specifically, the error I'm getting is "Cannot teleport due to hand object collision."

I'm only interacting with the environment using env.va_interact with smooth_nav=False so, as far as I can tell, I should be able to reverse the action without any problems (and based on the returned event, I can only assume that the look action is being mapped to TeleportFull somewhere in the back-end).

I've attached a zip file with a minimal example; you should be able to clone the repo, extract min_fail.zip in the root directory, and run it from the root directory with:

export PYTHONPATH="."
python3 min_fail/minimal.py

I did have to make three other changes to the repo's code, but I don't believe any of these should be making an impact:

  1. modify env/tasks.py to fully specify "from gen.graph" rather than "from graph"
  2. modify gen/graph/graph_obj.py to fully specify "import gen.constants as constants" rather than "import constants"
  3. modify X_DISPLAY = '1' rather than '0' in gen.constants

The minimal example code follows the expert demonstration up to a point, then looks down, and attempt to look up (at which point the action fails and the code prints out the error message "Cannot teleport due to hand object collision."); if you set SPIN=True in the code then the agent will spin 360 degrees in place before looking up/down (that doesn't have any effect).

I think the error is due to the carried box colliding with the scenery, but if that's the case then I don't understand why the agent can move forward into this space (or turn on the spot) without triggering a collision.

Any clarifications as to why this navigational action is apparently irreversible (or if I'm making a mistake interacting with the environment!) would be very much appreciated -- Thanks for your time!

min_fail.zip

thomason-jesse commented 2 years ago

Thanks for raising this issue. Irreversible actions are considered an open problem introduced by the ALFRED benchmark. In the case you describe above, the agent is holding something so large it "clips" part of the environment in some looking directions. That particular problem is a consequence of the AI2THOR simulator challenge, but we are leaning into it as an additional hard problem for ALFRED models!

One way to handle panoramic observations in the face of failed actions is to treat the failed look direction as empty/blank, as we did recently in EmBERT (https://github.com/amazon-research/embert/blob/main/scripts/generate_maskrcnn_horizon0.py#L404).

So, the upshot is: this is not a bug, and you didn't introduce it with your minimal changes, it's expected behavior from the AI2THOR simulator collision detection.

ikb-a commented 2 years ago

@thomason-jesse Thanks for the quick response; although the problem is not that some angles can't be observed, but that the successful LookDown action cannot be undone. In the code you linked, it's asserted that the agent can return to the original position from before the panoramic sweep (i.e. the position the agent was in, before the new actions that have been inserted). However, my minimal example seems to be a counter-example to that?

The agent can look down, but, immediately afterwards, it cannot look up, into the same position it was earlier. I assume this isn't the desired behaviour since, in real life, any valid navigation action should be reversible? (and the code you've linked to asserts that you can indeed return to the original position). If you could clarify whether this behaviour of being unable to return to a previous position is expected, it would be much appreciated. Thanks!

thomason-jesse commented 2 years ago

There's a couple things going on here, but basically "yes, this is expected, annoying behavior."

I think the reason for this has to do with objects in-hand in Ai2THOR v 2.1.0 changing orientation depending on the angle of the agent's horizon. What's happening under the hood in your example is something like this: when the agent moves forward with the box, the forward motion does not trigger object collision with the lamp. The agent looks down, and the angle of the box changes. When the agent tries to look back up, the box orientation has changed such that now it'll clip the lamp on the way back up to the original position.

In the EmBERT code, we get "around" this by basically ignoring it when the agent can't look up. You'll notice we first set the agent to LookUp to a zero horizon, but we don't check success. We hope the agent looked up, but if it didn't, we go ahead and try to grab a panorama at the 30 degree LookDown level. The teleport does fail sometimes! I don't remember the full percent, but we basically quietly drop these from the training data: https://github.com/amazon-research/embert/blob/main/scripts/generate_maskrcnn_horizon0.py#L532

ikb-a commented 2 years ago

Ah, I see; that makes sense now. A big thanks for clearing this up!