allenai / ai2thor

An open-source platform for Visual AI.
http://ai2thor.allenai.org
Apache License 2.0
1.14k stars 215 forks source link

meaning of `ObjectID` attribute #999

Open nikita-petrashen opened 2 years ago

nikita-petrashen commented 2 years ago

Hi!

I'm working with ALFRED right now and need to directly manipulate AI2-THOR scenes (in the form of point clouds) according to the tasks that are given. I was wondering how to interpret the ObjectID string which is in the ALFRED task descriptions (see pic below) to know with which object do we interact with exactly?

image

In the issue I've opened the creators sent me here to ask this questions, because this is an intrinsic THOR API mechanism.

Side question: if we set the state of the scene by sending SetObjectPoses action, can we save the new state in a .unity file? Maybe a few hints on how to add this functionality if it's not there? Would be great!

Thanks!

winthos commented 2 years ago

The ObjectID is generated for each sim object in a scene, and is comprised of the object's type as well as its position in world space

Internally, this value is generated once at the start of opening a new scene in THOR, and is again regenerated when the SetObjectPoses action is run. For example, an Apple in some scene could have a default position of (-02.08, 00.94, -03.62) which, if you just loaded this scene without doing any object rearrangement via actions like SetObjectPoses or InitialRandomSpawn, the ObjectID for this apple would be Apple|-02.08|+00.94|-03.62

If, however, after initially loading up this scene, you ran SetObjectPoses to rearrange the apple to some new default location, say (0, 0, 0), then the apple's ObjectID would change to Apple|+00.00|+00.00|+00.00 since the new "default" or starting location of the apple has been set by SetObjectPoses.

Note that actions like PickupObject and PutObject do NOT change the ObjectID, as even though these can reposition objects, they are used for manipulation after a scene has been set to its desired default configuration. The SetObjectPoses action is explicitly meant to be used for such initialization, and should be considered as an extension of the initialization process on top of setting up the Controller with your desired parameters like field of view and resolution.

Since SetObjectPoses can change the ObjectID of objects, in order to track which object is being affected in a scene, there is another unique identifier in a sim object's metadata called the name. The name of an object is the type of the object appended with a unique string identifier that never changes, even when reloading the scene or repositioning the object via SetObjectPoses. This means you can use the object name to identify a specific object regardless of your initialization process.

This information and more is returned as metadata that each object in a scene that is interactable has. This object metadata gives info of the state of the object, position, name, current ObjectID, and more that can be used to make sure you know exactly which object you are trying to interact with in what way. Field like pickupable and moveable also indicate the types of actions you can perform on said object, so for example something static like a Countertop object would not be able to be picked up and moved by the agent since it is a structure built into the side of a wall in a kitchen, but an object like a Book is pickupable. All this information is annotated in the object metadata, and further details of the types of interactions and what actions to call in order to perform them are on our documentation site.

Additionally, there is currently not a way to save the unity scene file after running SetObjectPoses as an action. This would have to be custom functionality added to the Unity Editor itself, as you will need to interface with the Editor in order to save the scene as a new version of the scene after using something like SetObjectPoses. Currently this isn't exposed via the python interface as that only interacts with the build of Unity, not the editor itself. However, including the SetObjectPoses action as part of your initialization step will be functionally the same as saving the unity scene as a new scene and loading that, so unless you need the unity file itself as its own manipulatable asset, you should be able to replicate any scene configuration generated via SetObjectPoses by just running SetObjectPoses itself after initializing the controller.

nikita-petrashen commented 2 years ago

Thanks for the detailed answer Winson!

Regarding the last part of your answer, I need the unity file itself exactly. I've been digging through the C# source code and could not find where does the loading of the unity file happen. I'm not really familiar with C#, but my guess is that in order to run SetObjectPoses and save the scene after that as a unity file I will have to implement a new ServerAction / DynamicServerAction which does the saving and then send the respective commands to the Python Controller. Is that a correct way?

Thanks!

nikita-petrashen commented 2 years ago

And the last question: when we call SetObjectPoses and pass the following dict: image Does position entry correspond to the center of the bounding box or something else?

winthos commented 2 years ago

Regarding the last part of your answer, I need the unity file itself exactly. I've been digging through the C# source code and could not find where does the loading of the unity file happen. I'm not really familiar with C#, but my guess is that in order to run SetObjectPoses and save the scene after that as a unity file I will have to implement a new ServerAction / DynamicServerAction which does the saving and then send the respective commands to the Python Controller. Is that a correct way?

So to actually edit and then save the scene file, this will need to be done without the python controller as this must be done from the Unity editor itself. The python interface is only meant to interact directly with a build of THOR from Unity, not the asset files themselves from within the Unity editor.

Since it seems like what you are wanting is to save some new .unity scene files after having made changes to them, what you will likely need to do is something like serialize out the information of the object poses that you would put through SetObjectPoses and then apply them to a scene, then save those changes as a new scene all from within the Unity Editor itself. You may need to do something like create an editor function that will essentially allow you to manipulate a scene as if you were using one of Unity's built-in dialogue boxes. This will allow you to make changes to a scene via a script, and then save the scene as a new .unity asset file.

This sort of leads into your other question.

And the last question: when we call SetObjectPoses and pass the following dict: Does position entry correspond to the center of the bounding box or something else?

This position corresponds to the position field of the game object's Transform component. First of all, there is the concept of a GameObject within Unity. Basically game objects are any objects that exist within a scene. This can be things like a character, environment objects, lights, sound emitters etc. Not all game objects have components on them that have a mesh, a renderer, etc, but all of them have a Transform component.

The Transform component represents a sort of pivot point that a game object can be manipulated from. Wherever this transform is centered about an object, all position and rotation changes are applied about that point.

The Transform center is not necessarily the center of the bounding box. Some objects have their transform at the average center of the mesh of an object, but often times the transform is centered about a different point, so this is not guaranteed.

Take for example these two objects, a statue and an apple. The apple's transform is centered around the average center of the apple object. Moving it moves the entire apple based on the center, and rotating the apple will change the Rotation field of the Transform by rotating it about the center of the apple. However, the statue's transform is centered closer to the base of the statue rather than the average center of the statue. This means when something like the rotation of the statue is modified within the statue's Transform it will rotate about its base where the transform is rather than about the center.

Apple

Statue

One thing to potentially look into is to recreate the logic used in the SetObjectPoses script, but integrate it into an editor function that will allow it to work in "editor-time" rather than runtime. SetObjectPoses will only execute in-editor if you hit the Play button up at the top of the editor, going into runtime mode. However, any changes to objects made during runtime will not be saved in the scene, as you can only save the scene while not in run mode but only in "editor mode." So one thing you may need to look into is to either store all the position/rotation changes you want to make to objects in the scene in some sort of serialized way, and to then load it up into the editor function to make changes that can be saved. You may also be able to do something like make changes in runtime mode but then serialize out those changes, which can then be applied again "for real" in editor mode.

nikita-petrashen commented 2 years ago

Thanks a lot, this is very helpful!