allenai / ai2thor

An open-source platform for Visual AI.
http://ai2thor.allenai.org
Apache License 2.0
1.17k stars 217 forks source link

Scene top view #124

Closed edugm94 closed 4 years ago

edugm94 commented 5 years ago

Hi all, I wonder whether it is possible to obtain a top-view image of a scene, where i can see the different objects placed in the scene (like sofas, tables, ...etc). I want to represent the movements of an agent in the environment from a top point of view of the scene.

Thanks in advance!

winthos commented 5 years ago

Hi! We don't have a function that explicitly does that, but it is possible to create a top down view by going into the Unity Editor and manipulating some objects in the Hierarchy. Open up the scene that you need a top-view of and either delete or disable the "Roof" object in the scene. Then you need to find the Scene Gizmo as shown here: Screen Shot 2019-03-11 at 9 49 41 AM Right click it and click "top" to align the camera to a top down view, and then toggle "perspective" on or off depending on if you want a flat map-like view or a more realistic perspective view.

Lucaweihs commented 5 years ago

Hi edugm94,

It's not officially supported but check out the ToggleMapView action in the newest version of thor (run it once to get a top view and again to get back to the normal view). Note: there seems to be a small bug with the makeAgentsVisible parameter of the initialize function at the moment so you wont be actually able to see the agent (hopefully fixed in master soon).

Again, not supported, but here's some code I used in a recent project to add a triangle to such a top level view representing the agent's perspective:

import math

import matplotlib
from ai2thor.controller import Controller

matplotlib.use("TkAgg", warn=False)

from PIL import Image, ImageDraw

import copy
import numpy as np

class ThorPositionTo2DFrameTranslator(object):
    def __init__(self, frame_shape, cam_position, orth_size):
        self.frame_shape = frame_shape
        self.lower_left = np.array((cam_position[0], cam_position[2])) - orth_size
        self.span = 2 * orth_size

    def __call__(self, position):
        if len(position) == 3:
            x, _, z = position
        else:
            x, z = position

        camera_position = (np.array((x, z)) - self.lower_left) / self.span
        return np.array(
            (
                round(self.frame_shape[0] * (1.0 - camera_position[1])),
                round(self.frame_shape[1] * camera_position[0]),
            ),
            dtype=int,
        )

def position_to_tuple(position):
    return (position["x"], position["y"], position["z"])

def get_agent_map_data(c: Controller):
    c.step({"action": "ToggleMapView"})
    cam_position = c.last_event.metadata["cameraPosition"]
    cam_orth_size = c.last_event.metadata["cameraOrthSize"]
    pos_translator = ThorPositionTo2DFrameTranslator(
        c.last_event.frame.shape, position_to_tuple(cam_position), cam_orth_size
    )
    to_return = {
        "frame": c.last_event.frame,
        "cam_position": cam_position,
        "cam_orth_size": cam_orth_size,
        "pos_translator": pos_translator,
    }
    c.step({"action": "ToggleMapView"})
    return to_return

def add_agent_view_triangle(
    position, rotation, frame, pos_translator, scale=1.0, opacity=0.7
):
    p0 = np.array((position[0], position[2]))
    p1 = copy.copy(p0)
    p2 = copy.copy(p0)

    theta = -2 * math.pi * (rotation / 360.0)
    rotation_mat = np.array(
        [[math.cos(theta), -math.sin(theta)], [math.sin(theta), math.cos(theta)]]
    )
    offset1 = scale * np.array([-1, 1]) * math.sqrt(2) / 2
    offset2 = scale * np.array([1, 1]) * math.sqrt(2) / 2

    p1 += np.matmul(rotation_mat, offset1)
    p2 += np.matmul(rotation_mat, offset2)

    img1 = Image.fromarray(frame.astype("uint8"), "RGB").convert("RGBA")
    img2 = Image.new("RGBA", frame.shape[:-1])  # Use RGBA

    opacity = int(round(255 * opacity))  # Define transparency for the triangle.
    points = [tuple(reversed(pos_translator(p))) for p in [p0, p1, p2]]
    draw = ImageDraw.Draw(img2)
    draw.polygon(points, fill=(255, 255, 255, opacity))

    img = Image.alpha_composite(img1, img2)
    return np.array(img.convert("RGB"))

if __name__ == "__main__":
    import matplotlib.pyplot as plt

    c = Controller()
    c.start()
    c.reset("FloorPlan1_physics")

    t = get_agent_map_data(c)
    new_frame = add_agent_view_triangle(
        position_to_tuple(c.last_event.metadata["agent"]["position"]),
        c.last_event.metadata["agent"]["rotation"]["y"],
        t["frame"],
        t["pos_translator"],
    )
    plt.imshow(new_frame)
    plt.show()
edugm94 commented 5 years ago

Thanks so much for the support @Lucaweihs :)

I want to use this code you gave me in some scenes I cannot find in AI2-THOR, but in the past they were. I asked the author, but he told me to contact the developers of the environment. Here you can find the repository and the files I am telling: https://github.com/caomw/icra2017-visual-navigation-1 . Is there any possibility to access through ai2-thor to those scenes?

Thanks so much in advance!

roozbehm commented 5 years ago

Unfortunately, due to some copyright issues we cannot release those scenes. We recommend using the current set of scenes.

mattdeitke commented 4 years ago

Merging discussion on the top-down to #445.