askforalfred / alfred

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
MIT License
352 stars 77 forks source link

Converting object positions to planner target poses #119

Closed jesbu1 closed 2 years ago

jesbu1 commented 2 years ago

Hi,

We were looking into writing reward functions based ALFRED dataset annotated skills that work independently of expert trajectories (so that we can do RL), and everything is completed except for skills corresponding to GotoLocation actions.

Here, we're having an issue with using x/y/z distances + visibility checks to ensure the agent is near a specified object from a GotoLocation task (e.g. table), so we wanted to instead convert the object positions to target agent poses for the planner.

We were thinking of converting object locations (x/y/z and rotation, for moveable objects within a scene) to the discrete pose used in the ground truth scene graph, then finding the nearest location in the graph that the agent can navigate to, and using the A* planning distance threshold.

Given this, how do we actually convert the event metadata object poses to the target discrete locations as presented in the dataset? I'm not sure how to calculate a correct target agent pose from an object location. One issue is finding the right discrete orientation so that the agent is actually facing the object, and another is ensuring that the converted discrete object location is actually reachable.

Thanks!

jesbu1 commented 2 years ago

An example that we have issues with would be in a case like in the file attached here.

The instruction is to "...Face the plate." How would we ensure that we're facing the plate here? This reward is calculated with a min (x, y) distance and a visibility check, but the object can be visible and close without us facing the object, so it may make more sense here to just calculate the discrete pose produced by the planner and try to match that pose.

Move_to_your_right_around_the_table_to_face_the_plate

MohitShridhar commented 2 years ago

@jesbu1, if I remember correctly, this is where the agent's pose is discretized. You can write an inverse function that converts discrete agent poses to continuous poses, and then use the continuous pose with object poses.

Regarding the orientation and reachability issue: I am not sure what is the best solution. You probably have to rollout actions with THOR to check if the objects are actually visible and reachable.

On a side note, the reward functions are just boilerplate code, which don't really get used anywhere in the codebase. So feel free to modify it to your needs.

askforalfred commented 2 years ago

Hey Jesse, some of the code I wrote for the oracle camera controls in TEACh might be helpful here, for example:

https://github.com/GLAMOR-USC/teach_tatc/blob/master/src/teach/simulators/simulator_THOR.py#L572 (face a camera at an object at a location that the agent can also stand).

Note the use here of a discretized navigation graph for agent positions, from which we can retrieve nearest valid agent positions from which to face the object. Getting the facing direction is simple geometry on the X and Z coords, with elevation requiring some trig against the Y coord of the object and the agent height (~1.8 in the TEACh version of THOR, but I think it's shorter in ALFRED). Hope this helps to get started.

jesbu1 commented 2 years ago

Thanks a lot for the response, we'll look into it!

Jiahui-3205 commented 2 years ago

@MohitShridhar Hi Mohit, We are also trying to compute the position the agent can stand while facing the object. However we found most of objects in ALFRED do not have the property "axisAlignedBoundingBox". Is there a way to get this property directly from the simulator or some way we can compute ourselves directly? Thanks

MohitShridhar commented 2 years ago

@Jiahui-3205, as far as I know, I don't think axis-aligned-bounding-boxes are part of the THOR 2.1.0 metadata. You probably have to compute it yourself.