gnns4hri / SocNavGym

GNU General Public License v3.0
15 stars 2 forks source link

SocNavGym : An environment for Social Navigation

Table of Contents

  1. Description
  2. Installation
  3. Usage
  4. Sample Code
  5. About the environment
  6. Conventions
  7. Observation Space
  8. Action Space
  9. Info Dict
  10. Reward Function
  11. Writing Custom Reward Functions
  12. Config File
  13. Wrappers
  14. Training Agents
  15. Evaluating Agents
  16. Manually Controlling the Robot
  17. Tutorials

Description

This repository contains the implementation of our paper "SocNavGym: A Reinforcement Learning Gym for Social Navigation", published in IEEE ROMAN, 2023.

Installation

  1. Install Python-RVO2 by following the instructions given in this repository.
  2. Install DGL (Deep Graph Library) for your system using this link.
  3. For installing the environment using pip:

    python3 -m pip install socnavgym

    For installing from source:

    git clone https://github.com/gnns4hri/SocNavGym.git
    python3 -m pip install .  # to install the environment to your Python libraries. This is optional. If you don't run this, then just make sure that your current working directory is the root of the repository when importing socnavgym.
  4. The Deep RL agents are written using Stable Baselines3. We used the following command to install SB3 for our experiments
    pip install git+https://github.com/carlosluis/stable-baselines3@fix_tests

    This is NOT a necessity for the environment to run. If you're going to use the stable_dqn.py we recommend installing by running the above command. Installing stable-baselines3 normally using pip might work, but we do not guarantee since it wasn't tested.

Usage

import socnavgym
import gym
env = gym.make('SocNavGym-v1', config="<PATH_TO_CONFIG>")  

Sample Code

import socnavgym
import gym
env = gym.make("SocNavGym-v1", config="./environment_configs/exp1_no_sngnn.yaml") 
obs, _ = env.reset()

for i in range(1000):
    obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
    env.render()
    if terminated or truncated:
        env.reset()

About the environment

SocNavGym-v1 is a highly customisable environment and the parameters of the environment can be controlled using the config files. There are a few config files in environment_configs/. For a better understanding of each parameter refer to this section. Other than the robot, the environment supports entities like plants, tables, laptops. The environment also models interactions between humans, and human-laptop. It can also contain moving crowds, and static crowds, and the ability to form new crowds and interactions, as well as disperse existing crowds and interactions. The environment follows the OpenAI Gym format implementing the step, render and reset functions. The environment uses the latest Gym API (gym 0.26.2).

Conventions

Observation Space

The observation returned when env.step(action) is called, consists of the following (all in the robot frame unless you're using the WorldFrameObservations wrapper):

The observation is of the type gym.Spaces.Dict. The dictionary has the following keys:

  1. "robot" : This is a vector of shape (9,) of which the first six values represent the one-hot encoding of the robot, i.e [1, 0, 0, 0, 0, 0]. The next two values represent the goal's x and y coordinates in the robot frame and the last value is the robot's radius.

  2. The other keys present in the observation are "humans", "plants", "laptops", "tables" and "walls". Every entity (human, plant, laptop, table, or wall) would have an observation vector given by the structure below:

    Encoding Relative Position Coordinates Relative Orientation Radius Relative Speeds Gaze
    enc0 enc1 enc2 enc3 enc4 enc5 x y sin(theta) cos(theta) radius relative speed relative angular speed gaze
    0 1 2 3 4 5 6 7 8 9 10 11 12 13

    Details of the field values:

    • One hot encodings of the object.

      The one hot encodings are as follows:

      • human: [0, 1, 0, 0, 0, 0]
      • table: [0, 0, 1, 0, 0, 0]
      • laptop: [0, 0, 0, 1, 0, 0]
      • plant: [0, 0, 0, 0, 1, 0]
      • wall: [0, 0, 0, 0, 0, 1]
    • x, y coordinates relative to the robot. For rectangular shaped objects the coordinates would correspond to the center of geometry.

    • theta : The orientation with respect to the robot

    • radius: Radius of the object. Rectangular objects will contain the radius of the circle that circumscribes the rectangle

    • relative translational speed is the magnitude of relative velocity of the entity with respect to the robot

    • relative angular speed is calculated by the difference in the angles across two consecutive time steps and dividing by the time-step

    • gaze value: for humans, it is 1 if the robot lies in the line of sight of humans, otherwise 0. For entities other than humans, the gaze value is 0. Line of sight of the humans is decided by whether the robot lies from -gaze_angle/2 to +gaze_angle/2 in the human frame. Gaze angle can be changed by changing the gaze_angle parameter in the config file.

    The observation vector of the all entities of the same type would be concatenated into a single vector and that would be placed in the corresponding key in the dictionary. For example, let's say there are 4 humans, then the four vectors of shape (14,) would be concatenated to (56,) and the "humans" key in the observation dictionary would contain the vector of size (56, ). Individual observations can be accessed by simply reshaping the observation to (-1, 14).

    For walls, each wall is segmented into smaller walls of size wall_segment_size (can be found the config). Observations from each segment are returned in obs["walls"]

  3. The observation space of the environment can be obtained by calling env.observation_space.

Action Space

The action space for holonomic robot consists of three components, vx, vy, and va. Here the X axis is the robot's heading direction. For differential drive robots, the component vy would be 0. You can control the type of the robot using the config file's robot_type parameter. All the three components take in a value between -1 and 1, which will be later mapped to the corresponding speed by using the maxima set in the config file. If you want to use a discrete action space, you could use the DiscreteActions wrapper.

Info Dict

The environment also returns meaningful metrics at every step in an episode. The following table describes each metric that is returned in the info dict.

Metric Description
"OUT_OF_MAP" Boolean value that indicates whether the robot went out of the map
"COLLISION_HUMAN" Boolean value that indicates whether the robot collided with a human
"COLLISION_OBJECT" Boolean value that indicates whether the robot collided with an object
"COLLISION_WALL" Boolean value that indicates whether the robot collided with a wall
"COLLISION" Boolean value that indicates whether the robot collided with any entity
"SUCCESS" Boolean value that indicates whether the robot reached the goal
"TIMEOUT" Boolean value that indicates whether the episode has terminated due maximum steps
"FAILURE_TO_PROGRESS" The number of timesteps that the robot failed to reduce the distance to goal
"STALLED_TIME" The number of timesteps that the robot's velocity is 0
"TIME_TO_REACH_GOAL" Number of time steps taken by the robot to reach its goal
"STL" Success weighted by time length
"SPL" Success weighted by path length
"PATH_LENGTH" Total path length covered by the robot
"V_MIN" Minimum velocity that the robot has achieved
"V_AVG" Average velocity of the robot
"V_MAX" Maximum velocity that robot has achieved
"A_MIN" Minimum acceleration that the robot has achieved
"A_AVG" Average acceleration of the robot
"A_MAX" Maximum acceleration that robot has achieved
"JERK_MIN" Minimum jerk that the robot has achieved
"JERK_AVG" Average jerk of the robot
"JERK_MAX" Maximum jerk that robot has achieved
"TIME_TO_COLLISION" Minimum time to collision with a human agent at any point in time in the trajectory, should all robots and humans move in a linear trajectory
"MINIMUM_DISTANCE_TO_HUMAN" Minimum distance to any human
"PERSONAL_SPACE_COMPLIANCE" Percentage of steps that the robot is not within the personal space (0.45m) of any human
"MINIMUM_OBSTACLE_DISTANCE" Minimum distance to any object
"AVERAGE_OBSTACLE_DISTANCE" Average distance to any object

Some additional metrics that are also provided are :

Metric Description
"DISCOMFORT_SNGNN" SNGNN_value (More about SNGNN in the section below)
"DISCOMFORT_DSRNN" DSRNN reward value (More about DSRNN reward function in the section below)
"sngnn_reward" SNGNN_value - 1
"distance_reward" Value of the distance reward

Note that the above 4 values are returned correctly if the reward function parameter in the config file is "dsrnn" or "sngnn". If a custom reward function is written, then the user is required to fill the above values otherwise 0s would be returned for them. For more information refer to Writing Custom Reward Functions.

Lastly, information about the interactions is returned as an adjacency list. There are two types of interactions, "human-human" and "human-laptop". For every interaction between human i and human j (i and j are the based on the order in which the human's observations appear in the observation dictionary. So to extract the ith human's observation, you could just do obs["humans"].reshape(-1, 14)[i]), the tuple (i, j) and (j, i) would be present in info["interactions"]["human-human"], and similarly for an interaction between the ith human and the jth laptop, the tuple (i, j) would be present in info["interactions"]["human-laptop"]. Again, j is based on the order in which the laptops appear in the observation. To make it more clear, let's consider an example. Consider 4 humans, 1 table and two laptops in the environment and no walls. Also, two among the 4 humans are interacting with each other, and one other human is interacting with a laptop. For this scenario, the observation returned would be like this:

obs_dict = {
    "humans": [obs_human0, obs_human1, obs_human2, obs_human3],  # human observations stacked in a 1D array
    "tables": [obs_table0],  # table observation
    "laptops": [obs_laptop0, obs_laptop1]  # laptop observations stacked in a 1D array.
}

Let's say the humans with observations obs_human1 and obs_human2 are the ones who are interacting. Also, the human whose observation is obs_human3 interacts with the laptop which has an observation obs_laptop1. For such a case, the info dict for interactions would look like this:

info = {
    "interactions": {
        "human-human": [(1, 2), (2, 1)],
        "human-laptop": [(3, 1)]
    }
    ...  # rest of the info dict
}

Reward Function

The environment provides implementation of the SNGNN reward function, and the DSRNN reward function. If you want to use these reward functions, the config passed to the environment should have the value corresponding to the field reward_file as "sngnn" or "dsrnn" respectively.

The environment also allows users to provide custom reward functions. Follow the guide below to create your own reward function.

Writing Custom Reward Functions

  1. Create a new python file in which you have to create a class named Reward. It must inherit from RewardAPI class. To do this, do the following

    from socnavgym.envs.rewards import RewardAPI
    
    class Reward(RewardAPI):
        ...
  2. Overwrite the function compute_reward with the custom reward function. The input of the compute_reward function is the action of the current timestep, the previous entity observations and the current entity observations. The previous and current observations are given as a dictionary with key as the id of the entity, and the value is an instance of the EntityObs namedtuple defined in this file. It contains the fields : id, x, y, theta, sin_theta, cos_theta for each entity in the environment. Note that all these values are in the robot's frame of reference.
  3. If need be, you can also access the lists of humans, plants, interactions etc, that the environment maintains by referencing the self.env variable. An example of this can be found in the dsrnn_reward.py file
  4. The RewardAPI class provides four helper functions - check_collision, check_timeout, check_reached and check_out_of_map. These functions are boolean functions that check if the robot has collided with any enitity, whether the maximum episode length has been reached, whether the robot has reached the goal, or if the robot has moved out of the map respectively. The last case can occur only when the envirnoment is configured to have no walls.
  5. The RewardAPI class also has a helper function defined to compute the SNGNN reward function. Call compute_sngnn_reward(actions, prev_obs, curr_obs) to compute the SNGNN reward. Also note that if you are using the SNGNN reward function in your custom reward function, please set the variable self.use_sngnn to True.
  6. You can also store any additional information that needs to be returned in the info dict of step function by storing all of it in the variable self.info of the Reward class.
  7. Storing anything in a class variable will persist across the steps in an episode. After every episode, the reward class object's __init__() method would be invoked.
  8. Provide the path to the file where you defined your custom reward function in the config file's reward_file.

Config File

The behaviour of the environment is controlled using the config file. The config file needs to be passed as a parameter while doing gym.make. These are the following parameters and their corresponding descriptions

Parameter Description
rendering resolution_view size of the window for rendering the environment
milliseconds delay parameter for waitKey()
episode episode_length maximum steps in an episode
time_step number of seconds that one step corresponds to
robot robot_radius radius of the robot
goal_radius radius of the robot's goal
robot_type Accepted values are "diff-drive" (for differential drive robot) and "holonomic" (for holonomic robot)
human human_diameter diameter of the human
human_goal_radius radius of human's goal
human_policy policy of the human. Can be "random", "sfm", or "orca". If "random" is chosen, then one of "orca" or "sfm" would be randomly chosen
gaze_angle gaze value (in the observation) for humans would be set to 1 when the robot lies between -gaze_angle/2 and +gaze_angle/2
fov_angle the frame of view for humans
prob_to_avoid_robot the probability that the human would consider the robot in its policy
laptop laptop_width width of laptops
laptop_length length of laptops
plant plant_radius radius of plant
table table_width width of tables
table_length length of tables
wall wall_thickness thickness of walls
human-human-interaction interaction_radius radius of the human-crowd
interaction_goal_radius radius of the human-crowd's goal
noise_varaince a random noise of normal(0, noise_variance) is applied to the humans' speed to break uniformity
human-laptop-interaction human_laptop_distance distance between human and laptop
env margin margin for the env
max_advance_human maximum speed for humans
max_advance_robot maximum linear speed for the robot
max_rotation maximum rotational speed for robot
wall_segment_size size of the wall segment, used when segmenting the wall
speed_threshold speed below which would be considered 0 (for humans)
crowd_dispersal_probability probability of crowd dispersal
human_laptop_dispersal_probability probability to disperse a human-laptop-interaction
crowd_formation_probability probability of crowd formation
human_laptop_formation_probability probability to form a human-laptop-interaction
reward_file Path to custom-reward file. If you want to use the in-built SNGNN reward function or the DSRNN reward function, set the value to "sngnn" or "dsrnn" respectively
cuda_device cuda device to use (in case of multiple cuda devices). If cpu or only one cuda device, keep it as 0
min_static_humans minimum no. of static humans in the environment
max_static_humans maximum no. of static humans in the environment
min_dynamic_humans minimum no. of dynamic humans in the environment
max_dynamic_humans maximum no. of dynamic humans in the environment
min_tables minimum no. of tables in the environment
max_tables maximum no. of tables in the environment
min_plants minimum no. of plants in the environment
max_plants maximum no. of plants in the environment
min_laptops minimum no. of laptops in the environment
max_laptops maximum no. of laptops in the environment
min_h_h_dynamic_interactions minimum no. of dynamic human-human interactions in the env. Note that these crowds can disperse if the parameter crowd_dispersal_probability is greater than 0
max_h_h_dynamic_interactions maximum no. of dynamic human-human interactions in the env. Note that these crowds can disperse if the parameter crowd_dispersal_probability is greater than 0
min_h_h_dynamic_interactions_non_dispersing minimum no. of dynamic human-human interactions in the env. Note that these crowds never disperse, even if the parameter crowd_dispersal_probability is greater than 0
max_h_h_dynamic_interactions_non_dispersing maximum no. of dynamic human-human interactions in the env. Note that these crowds never disperse, even if the parameter crowd_dispersal_probability is greater than 0
min_h_h_static_interactions minimum no. of static human-human interactions in the env. Note that these crowds can disperse if the parameter crowd_dispersal_probability is greater than 0
max_h_h_static_interactions maximum no. of static human-human interactions in the env. Note that these crowds can disperse if the parameter crowd_dispersal_probability is greater than 0
min_h_h_static_interactions_non_dispersing minimum no. of static human-human interactions in the env. Note that these crowds never disperse, even if the parameter crowd_dispersal_probability is greater than 0
max_h_h_static_interactions_non_dispersing maximum no. of static human-human interactions in the env. Note that these crowds never disperse, even if the parameter crowd_dispersal_probability is greater than 0
min_human_in_h_h_interactions minimum no. of humans in a human-human interaction
max_human_in_h_h_interactions maximum no. of humans in a human-human interaction
min_h_l_interactions minimum no. of human-laptop interactions in the env. Note that these crowds can disperse if the parameter human_laptop_dispersal_probability is greater than 0
max_h_l_interactions maximum no. of human-laptop interactions in the env. Note that these crowds can disperse if the parameter human_laptop_dispersal_probability is greater than 0
min_h_l_interactions_non_dispersing minimum no. of human-laptop interactions in the env. Note that these crowds never disperse, even if the parameter human_laptop_dispersal_probability is greater than 0
max_h_l_interactions_non_dispersing maximum no. of human-laptop interactions in the env. Note that these crowds never disperse, even if the parameter human_laptop_dispersal_probability is greater than 0
get_padded_observations flag value that indicates whether you require padded observations or not. You can change it using env.set_padded_observations(True/False)
set_shape Sets the shape of the environment. Accepted values are "random", "square", "rectangle", "L" or "no-walls"
add_corridors True or False, whether there should be corridors in the environment
min_map_x minimum size of map along x direction
max_map_x maximum size of map along x direction
min_map_y minimum size of map along y direction
max_map_y maximum size of map along y direction

Wrappers

Gym wrappers are convenient to have changes in the observation-space / action-space. SocNavGym implements 4 wrappers.

The following are the wrappers implemented by SocNavGym:

  1. DiscreteActions : To change the environment from a continuous action space environment to a discrete action space environment. The action space consists of 7 discrete actions. They are :

    • Turn anti-clockwise (0)
    • Turn clock-wise (1)
    • Turn anti-clockwise and moving forward (2)
    • Turning clockwise and moving forward (3)
    • Move forward (4)
    • Move backward (5)
    • Stay still (6)

    As an example, to make the robot move forward throughout the episode, just do the following:

    import socnavgym
    from socnavgym.wrappers import DiscreteActions
    
    env = gym.make("SocNavGym-v1", config="environment_configs/exp1_no_sngnn.yaml")  # you can pass any config
    env = DiscreteActions(env)  # creates an env with discrete action space
    
    # simulate an episode with random actions
    done = False
    env.reset()
    while not done:
        obs, rew, terminated, truncated, info = env.step(4)  # 4 is for moving forward 
        done = terminated or truncated
        env.render()
    
  2. NoisyObservations : This wrapper can be used to add noise to the observations so as to emulate real world sensor noise. The parameters that the wrapper takes in are mean, std_dev. Apart from this, there is also a parameter called apply_noise_to which defaults to [robot", "humans", "tables", "laptops", "plants", "walls"], meaning all enitity types. If you want to apply noise to only a few entity types, then pass a list with only those entity types to this parameter. The noise value can be controlled using the mean and the std_dev parameters. Basically, a Gaussian noise with mean and std_dev is added to the observations of all the entities whose entity type is listed in the parameter apply_noise_to. As an example, to add a small noise with 0 mean and 0.1 std dev to all entity types do the following:

    import socnavgym
    from socnavgym.wrappers import NoisyObservations
    
    env = gym.make("SocNavGym-v1", config="environment_configs/exp1_no_sngnn.yaml")  # you can pass any config
    env = NoisyObservations(env, mean=0, std_dev=0.1)
    
    # simulate an episode with random actions
    done = False
    env.reset()
    while not done:
        obs, rew, terminated, truncated, info = env.step(env.action_space.sample())  # obs would now be a noisy observation
    
        done = terminated or truncated
        env.render()
    
  3. PartialObservations : This wrapper is used to return observations that are present in the frame of view of the robot, and also that lies within the range. Naturally, the wrapper takes in two parameters fov_angle and the range. An example of using the PartialObservations wrapper:

    import socnavgym
    from socnavgym.wrappers import PartialObservations
    from math import pi
    
    env = gym.make("SocNavGym-v1", config="environment_configs/exp1_no_sngnn.yaml")  # you can pass any config
    env = PartialObservations(env, fov_angle=2*pi/3, range=1)  # creates a robot with a 120 degreee frame of view, and the sensor range is 1m.
    
    # simulate an episode with random actions
    env.reset()
    done = False
    while not done:
        obs, rew, terminated, truncated, info = env.step(env.action_space.sample())
        done = terminated or truncated
        env.render()
    
  4. WorldFrameObservations : Returns all the observations in the world frame. The observation space of the "robot" would look like this:

    Encoding Robot Goal coordinates Robot coordinates Angular Details Velocities Speeds Radius
    enc0 enc1 enc2 enc3 enc4 enc5 goal_x goal_y x y sin(theta) cos(theta) vel_x vel_y vel_a radius
    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    The other enitity observations would remain the same, the only difference being that the positions and velocities would be in the world frame of reference and not in the robot's frame of reference.

    An example of using the WorldFrameObservations wrapper:

    import socnavgym
    from socnavgym.wrappers import WorldFrameObservations
    from math import pi
    
    env = gym.make("SocNavGym-v1", config="environment_configs/exp1_no_sngnn.yaml")  # you can pass any config
    env = WorldFrameObservations(env) 
    
    # simulate an episode with random actions
    env.reset()
    done = False
    while not done:
        obs, rew, terminated, truncated, info = env.step(env.action_space.sample())  # obs contains observations that are in the world frame 
        done = terminated or truncated
        env.render()
    

Training Agents

The script to train the agents is stable_dqn.py. This is an implementation of DuelingDQN using StableBaselines3. We use Comet ML for logging, so please create an account before proceeding. It is completely free of cost. Run the following commands to reproduce our results on the experiments mentioned in the paper:

  1. Experiment 1 (Using DSRNN Reward)

    python3 stable_dqn.py -e="./environment_configs/exp1_no_sngnn.yaml" -r="dsrnn_exp1" -s="dsrnn_exp1" -d=False -p=<project_name> -a=<api_key>
  2. Experiment 1 (Using SNGNN Reward)

    python3 stable_dqn.py -e="./environment_configs/exp1_with_sngnn.yaml" -r="sngnn_exp1" -s="sngnn_exp1" -d=False -p=<project_name> -a=<api_key>
  3. Experiment 2 (Using DSRNN Reward)

    python3 stable_dqn.py -e="./environment_configs/exp2_no_sngnn.yaml" -r="dsrnn_exp2" -s="dsrnn_exp2" -d=False -p=<project_name> -a=<api_key>
  4. Experiment 2 (Using SNGNN Reward)

    python3 stable_dqn.py -e="./environment_configs/exp2_with_sngnn.yaml" -r="sngnn_exp2" -s="sngnn_exp2" -d=False -p=<project_name> -a=<api_key>
  5. Experiment 3 (Using DSRNN Reward)

    python3 stable_dqn.py -e="./environment_configs/exp3_no_sngnn.yaml" -r="dsrnn_exp3" -s="dsrnn_exp3" -d=False -p=<project_name> -a=<api_key>
  6. Experiment 3 (Using SNGNN Reward)

    python3 stable_dqn.py -e="./environment_configs/exp3_with_sngnn.yaml" -r="sngnn_exp3" -s="sngnn_exp3" -d=False -p=<project_name> -a=<api_key>

    In general, the stable_dqn script can be used as follows:

    
    usage: python3 stable_dqn.py [-h] -e ENV_CONFIG -r RUN_NAME -s SAVE_PATH -p
                     PROJECT_NAME -a API_KEY [-d USE_DEEP_NET] [-g GPU]

optional arguments: -h, --help show this help message and exit -e ENV_CONFIG, --env_config ENV_CONFIG path to environment config -r RUN_NAME, --run_name RUN_NAME name of comet_ml run -s SAVE_PATH, --save_path SAVE_PATH path to save the model -p PROJECT_NAME, --project_name PROJECT_NAME project name in comet ml -a API_KEY, --api_key API_KEY api key to your comet ml profile -d USE_DEEP_NET, --use_deep_net USE_DEEP_NET True or False, based on whether you want a transformer based feature extractor -g GPU, --gpu GPU gpu id to use


## Evaluating Agents
The evaluation script for the Dueling DQN agent using StableBaselines3 can be found in `sb3_eval.py`. 
```bash
usage: python3 sb3_eval.py [-h] -n NUM_EPISODES -w WEIGHT_PATH -c CONFIG

optional arguments:
  -h, --help            show this help message and exit
  -n NUM_EPISODES, --num_episodes NUM_EPISODES
                        number of episodes
  -w WEIGHT_PATH, --weight_path WEIGHT_PATH
                        path to weight file
  -c CONFIG, --config CONFIG
                        path to config file

Manually Controlling the Robot

You can control the robot using a joystick and also record observations, actions and rewards. To do this, run the manual_control_js.py.

usage: python3 manual_control_js.py [-h] -n NUM_EPISODES [-j JOYSTICK_ID] [-c CONFIG] [-r RECORD] [-s START]

optional arguments:
  -h, --help            show this help message and exit
  -n NUM_EPISODES, --num_episodes NUM_EPISODES
                        number of episodes
  -j JOYSTICK_ID, --joystick_id JOYSTICK_ID
                        Joystick identifier
  -c CONFIG, --config CONFIG
                        Environment config file
  -r RECORD, --record RECORD
                        Whether you want to record the observations, and actions or not
  -s START, --start START
                        starting episode number

Tutorials

  1. Installation tutorial
  2. Training a Deep RL agent on SocNavGym