ai2thor with gym - Githubissues

zyzhang1130 commented 4 years ago

Hi, may I ask is there any simple way to make ai2thor analogous as a sub-environment of gym, such at setting it as an environment could be done by env = gym.make(args.env_id) where the env_id is ai2thor? Really appreciate it if can give some guidance on this issue.

roozbehm commented 4 years ago

There are some gym wrappers for THOR. Here is an example: https://github.com/TheMTank/cups-rl. It is developed by other groups and I am not sure if it is compatible with the latest version of THOR or not.

zyzhang1130 commented 4 years ago

Hi, thank you for the prompt reply. I actually have tried it, but in cups-rl the env wrapper is coupled with the RL algorithm. I tried to decouple them but it seems non-trivial (I need to exclude the RL algorithm for my use).

mattdeitke commented 4 years ago

Hey!

I've wanted to get around to supporting a gym environment for a while. The problem is that AI2-THOR is super flexible towards working with different tasks that there's no "one solution fits all" as most gym environments provide. For example, people might want to do navigation to TVs with RoboTHOR, navigation to faucets in bathrooms with iTHOR, interact with bread in kitchens with iTHOR, etc.

So the one thing we can provide is a Parent class with some default initializations for scenes and actions, and then provide the flexibility for users to tweak the defaults with their specific needs. For example, users will easily be able to change the scenes they work with, the actions, the observations, and reward function, etc.

Below, I've provided a gym controller for ObjectNav within RoboTHOR if you want to get started with it right away. Here, just read over the functions and change them as you see fit.

from ai2thor.controller import Controller
import gym
from gym.spaces import Discrete, Box
from collections import defaultdict
from typing import List, Tuple, Dict
import numpy as np
import random

class GymController:
    """An object navigation RoboTHOR Gym Environment.

    Observations are RGB images in the form of Box(height, width, 3)
        with values from [0:1]. 

    Actions are Discrete(4); that is, (0, 1, 2, 3). Here,
        0 = 'RotateLeft'
        1 = 'MoveAhead'
        2 = 'RotateRight'
        3 = 'Done' (see https://arxiv.org/pdf/1807.06757.pdf)
    """

    valid_target_objects = {
        'AlarmClock',
        'Apple',
        'BaseballBat',
        'BasketBall',
        'Bowl',
        'GarbageCan',
        'HousePlant',
        'Laptop',
        'Mug',
        'RemoteControl',
        'SprayBottle',
        'Television',
        'Vase'
    }

    def __init__(self,
            target_object: str='Television',
            controller_properties: dict=dict(),
            horizon: int=200,
            seed: int=42):
        """Initializes a new AI2-THOR Controller and gym environment.

        Args:
            'target_object' (str='Television'): The name of the target object. See
                https://ai2thor.allenai.org/robothor/documentation/#object-types-
                for more options.

            'conroller_properties' (dict=None): The properties used to initialize
                an AI2-THOR Controller object. For more information, see:
                https://ai2thor.allenai.org/robothor/documentation/#initialization.

            'horizon' (int=200): Maximum number of time steps before failure.

            'seed' (int=42): The random seed used for reproducing results.
        """
        assert target_object in self.valid_target_objects, \
            'Invalid target object, see https://ai2thor.allenai.org/robothor/documentation/#object-types-.'
        self.controller = Controller(**controller_properties)

        # helper fields
        self.target_object = target_object
        self.current_time_step = 0
        self.episode_already_done = False
        self.horizon = horizon

        # set scenes and get reachable positions for each scene
        self._scene_names = self.scene_names()
        self.reachable_positions = self._get_all_reachable_positions()

        # for reset rotations
        self.rotateStepDegrees = 90 if 'rotateStepDegrees' not in \
            controller_properties else controller_properties['rotateStepDegrees']

        # set gym action space and observation space
        self.width = 300 if 'width' not in controller_properties \
            else controller_properties['width']
        self.height = 300 if 'height' not in controller_properties \
            else controller_properties['height']

        # set random seeds
        self._seed(seed)

    @property
    def observation_space(self) -> gym.spaces:
        """Returns the gym.spaces observation space."""
        return Box(low=0, high=1, shape=(self.height, self.width, 3), dtype=np.float64)

    @property
    def action_space(self) -> gym.spaces:
        """Returns the gym.spaces action space."""
        return Discrete(4)

    def scene_names(self) -> List[str]:
        """Returns a list of the RoboTHOR training scene names.

        For more information on RoboTHOR scenes, see:
        https://ai2thor.allenai.org/robothor/documentation/#training-scenes
        """
        scenes = []
        for wall_config in range(1, 13):
            for object_config in range(1, 6):
                scenes.append(f'FloorPlan_Train{wall_config}_{object_config}')
        return scenes

    def step(self, action: int) -> Tuple[np.array, float, bool, dict]:
        """Takes a step in the AI2-THOR environment.

        If the episode is already over the action will not be called
            and nothing will return.

        Actions are Discrete(4); that is, (0, 1, 2, 3). Here,
            0 = 'RotateLeft'
            1 = 'MoveAhead'
            2 = 'RotateRight'
            3 = 'Done' (see https://arxiv.org/pdf/1807.06757.pdf)

        Returns a tuple of (
            observation (np.array): The agent's observation after taking
                a step. This is updated in get_observation(),

            reward (float): The reward from the environment after
                taking a step. This is updated in reward_function().

            done (bool): True if the episode ends after the action,

            metadata (dict): The metadata after taking the action. See
                https://ai2thor.allenai.org/robothor/documentation/#metadata
                for more information.
        )
        """
        assert action in self.action_space, 'Invalid action'
        if self.episode_already_done:
            return None, None, True, None

        if action == 0:
            self.controller.step(action='RotateLeft')
        elif action == 1:
            self.controller.step(action='MoveAhead')
        elif action == 2:
            self.controller.step(action='RotateRight')

        self.current_time_step += 1
        done = self.episode_done(done_action_called=action==3)

        return (
            self.get_observation(),
            self.reward_function(),
            done,
            self.controller.last_event.metadata
        )

    def reward_function(self) -> float:
        """Returns 1 if the episode is a success and done, otherwise -1."""
        return 1. if self.episode_success() else -1.

    def episode_success(self) -> bool:
        """Returns True if the episode is done and a target object is visible."""
        if self.episode_already_done:
            objects = self.controller.last_event.metadata['objects']
            for obj in objects:
                if obj['objectType'] == self.target_object and obj['visible']:
                    return True
        return False

    def episode_done(self, done_action_called: bool=False) -> bool:
        """Returns True if the episode is done.

        Args:
            'done_action_called' (bool=False): Did the agent call the Done
                action? For embodied navigation, it is recommended that the
                agent calls a 'Done' action when it believes it has
                finished its task. For more information, see
                https://arxiv.org/pdf/1807.06757.pdf.
        """
        self.episode_already_done = self.episode_already_done or \
            done_action_called or self.current_time_step > self.horizon
        return self.episode_already_done

    def get_observation(self) -> np.array:
        """Returns the normalized RGB image frame from THOR."""
        rgb_image = self.controller.last_event.frame
        return rgb_image / 255

    def reset(self) -> np.array:
        """Resets the agent to a random position/rotation in a random scene
           and returns an initial observation."""
        self.episode_already_done = False

        # choose a random scene
        scene = random.choice(self._scene_names)
        self.controller.reset(scene)

        # set a random initial position
        rand_xyz_pos = random.choice(self.reachable_positions[scene])

        # note that np.arange works with decimals, while range doesn't
        rand_yaw = random.choice(np.arange(0, 360, self.rotateStepDegrees))

        self.controller.step(action='TeleportFull',
            rotation=dict(x=0.0, y=rand_yaw, z=0.0),
            **rand_xyz_pos
        )

        return self.get_observation()

    def close(self):
        """Ends the controller's session."""
        self.controller.stop()

    def _get_all_reachable_positions(self) -> Dict[str, Dict[str, float]]:
        """Sets the reachable positions for each scene in 'scene_names()'."""
        reachable_positions = dict()
        for scene in self._scene_names:
            self.controller.reset(scene)
            event = self.controller.step(action='GetReachablePositions')
            reachable_positions[scene] = event.metadata['reachablePositions']
        return reachable_positions

    def _seed(self, seed_num: int=42):
        """Sets the random seed for reproducibility."""
        random.seed(seed_num)

    def render(self, mode=None) -> None:
        """Provides a warning that render doesn't need to be called for AI2-THOR.

        We have provided it in case somebody copies and pastes code over
        from OpenAI Gym."""
        import warnings
        warnings.warn('The render function call is unnecessary for AI2-THOR.')

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.controller.stop()

Then to run use the environment, run

with GymController() as env:
    for i_episode in range(20):
        observation = env.reset()
        for t in range(100):
            action = env.action_space.sample()
            observation, reward, done, metadata = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break

or use the standard gym syntax

env = GymController()
for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        action = env.action_space.sample()
        observation, reward, done, metadata = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break
env.close()

Expect a follow-up and some new documentation soon, on how one would inherit and create a gym controller from the native AI2-THOR API!

If you have any suggestions, definitely feel free to chime in. I'd expect some style changes to what is shown above, but nothing too dramatic.

Hope that helps! Matt :)

zyzhang1130 commented 4 years ago

@mattdeitke Wow I found it super helpful! I'll explore further within the context of my task and see how it goes. Thanks a lot man!

zyzhang1130 commented 4 years ago

@mattdeitke Hi, may I check that what is supposed to be the shape of the action_space if action_space is defined with Discrete() and how to read it? In env for such as mujoco it is class 'gym.spaces.box.Box' type and its shape is of the form (3,) (as an example) which is a tuple. Can I define action_space for ai2thor as 'gym.spaces.box.Box' as well? Thanks for replying.

mattdeitke commented 4 years ago

Yeah, it should be able to work with all gym.spaces. Just make sure that you update the step method as well. As a quick rundown, Box often refers to specifying a continuous quantity, where one could the action space as

from gym.spaces import Box
import numpy as np
...
    @property
    def action_space(self) -> gym.spaces:
        """Returns the gym.spaces action space."""
        return Box(low=0, high=1, shape=(3,), dtype=np.float64)

For 3 values between [0:1]. Also, make sure that you change the step function to something like

    def step(self, action: Tuple[float, float, float]) -> Tuple[np.array, float, bool, dict]:
        """Takes a step in the AI2-THOR environment.

        If the episode is already over the action will not be called
            and nothing will return.

        Actions are Box(low=0, high=1, shape=(3,), dtype=np.float64). Here,
            action[0]: Rotates the agent from its current position between [0:1] * 360 degrees.
            action[1]: Moves the agent [0:1] meters from its current position and newly
                           facing direction (as changed with action[0]). If moving in this magnitude
                           results in a collision, then the position is not changed.
            action[2]: Changes the Horizon to be between [-30:30] degrees with [0:1] * 61 - 30.

        Returns a tuple of (
            observation (np.array): The agent's observation after taking
                a step. This is updated in get_observation(),

            reward (float): The reward from the environment after
                taking a step. This is updated in reward_function().

            done (bool): True if the episode ends after the action,

            metadata (dict): The metadata after taking the action. See
                https://ai2thor.allenai.org/robothor/documentation/#metadata
                for more information.
        )
        """
        assert action in self.action_space, 'Invalid action'
        if self.episode_already_done:
            return None, None, True, None

        r = self.controller.last_event.metadata['agent']['rotation']
        p = self.controller.last_event.metadata['agent']['position']

        self.controller.step(action='TeleportFull',
            x=p['x'], y=p['y'], z=p['z'],   # same position
            rotation=dict(x=r['x'], y=r['y'] + action[0] * 360, z=p['z'],
            horizon=action[2] * 61 - 30)

        self.controller.step(action='MoveAhead', moveMagnitude=action[1])

        self.current_time_step += 1
        done = raise NotImplementedError()

        return (
            self.get_observation(),
            self.reward_function(),
            done,
            self.controller.last_event.metadata
        )

Note that for this to work, make sure to initialize continuous=True, as in

from ai2thor.controller import Controller
c = Controller(continuous=True)

Footnote 1.

In a multi-agent case, you may want to use MultiDiscrete or MultiBinary to specify multiple different discrete actions.

Footnote 2.

For more complex environments, you may want to use spaces.Tuple or spaces.Dict. These are definitely the most flexible. Here's a Tuple example and here's a Dict example.

Aside: since both typing.Tuple and spaces.Tuple would conflict upon import, I'd probably rename the gym spaces with

from gym.spaces import Tuple as GymTuple
from gym.spaces import Dict as GymDict

Footnote 3.

Debug and sample from the space with

a = Box(low=0, high=1, shape=(3,), dtype=np.float64)
a.sample()  # np array of 3 possible values
a.low  # lowest value
a.high  # highest value

Matt :)

zyzhang1130 commented 4 years ago

@mattdeitke Than you again for your prompt and informative reply! I have a bit of questions though:

Why in the box implementation done = raise NotImplementedError() while in the discrete implementation done = self.episode_done(done_action_called=action==3)? As in why done is an error now? Isn't it a legitimate action? Also my IDE (spyder) shows done = raise NotImplementedError() has a syntax error.

Thanks again for replying.

mattdeitke commented 4 years ago

Hey, sorry for the miscommunication. Done raises a NoteImplementedError because I wasn't sure how you wanted the episode to end (so I didn't implement it). In the discrete case, there's an action dedicated to Done, but in the continuous/Box case choosing when to stop based on the action might be a bit harder.

One possible solution, for example, could say that if

done = action[0] > 0.9

to map the 1st continuous action to the done action.

Matt.

zyzhang1130 commented 4 years ago

@mattdeitke noted with thanks.

allenai / ai2thor

ai2thor with gym #416