SocNavGym : An environment for Social Navigation

Description

This repository contains the implementation of our paper "SocNavGym: A Reinforcement Learning Gym for Social Navigation", published in IEEE ROMAN, 2023.

Installation

Install Python-RVO2 by following the instructions given in this repository.
Install DGL (Deep Graph Library) for your system using this link.

For installing the environment using pip:

python3 -m pip install socnavgym

For installing from source:

git clone https://github.com/gnns4hri/SocNavGym.git
python3 -m pip install .  # to install the environment to your Python libraries. This is optional. If you don't run this, then just make sure that your current working directory is the root of the repository when importing socnavgym.

The Deep RL agents are written using Stable Baselines3. We used the following command to install SB3 for our experiments
```
pip install git+https://github.com/carlosluis/stable-baselines3@fix_tests
```
This is NOT a necessity for the environment to run. If you're going to use the stable_dqn.py we recommend installing by running the above command. Installing stable-baselines3 normally using pip might work, but we do not guarantee since it wasn't tested.

Usage

import socnavgym
import gym
env = gym.make('SocNavGym-v1', config="<PATH_TO_CONFIG>")

Sample Code

import socnavgym
import gym
env = gym.make("SocNavGym-v1", config="./environment_configs/exp1_no_sngnn.yaml") 
obs, _ = env.reset()

for i in range(1000):
    obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
    env.render()
    if terminated or truncated:
        env.reset()

About the environment

SocNavGym-v1 is a highly customisable environment and the parameters of the environment can be controlled using the config files. There are a few config files in environment_configs/. For a better understanding of each parameter refer to this section. Other than the robot, the environment supports entities like plants, tables, laptops. The environment also models interactions between humans, and human-laptop. It can also contain moving crowds, and static crowds, and the ability to form new crowds and interactions, as well as disperse existing crowds and interactions. The environment follows the OpenAI Gym format implementing the step, render and reset functions. The environment uses the latest Gym API (gym 0.26.2).

Conventions

X-axis points in the direction of zero-angle.
The orientation of the entities the angle between the X-axis of the entity and the X-axis of the ground frame.
Origin is at the center of the room

Observation Space

The observation returned when env.step(action) is called, consists of the following (all in the robot frame unless you're using the WorldFrameObservations wrapper):

The observation is of the type gym.Spaces.Dict. The dictionary has the following keys:

"robot" : This is a vector of shape (9,) of which the first six values represent the one-hot encoding of the robot, i.e [1, 0, 0, 0, 0, 0]. The next two values represent the goal's x and y coordinates in the robot frame and the last value is the robot's radius.

The other keys present in the observation are "humans", "plants", "laptops", "tables" and "walls". Every entity (human, plant, laptop, table, or wall) would have an observation vector given by the structure below:

Encoding						Relative Position Coordinates		Relative Orientation		Radius	Relative Speeds		Gaze
enc0	enc1	enc2	enc3	enc4	enc5	x	y	sin(theta)	cos(theta)	radius	relative speed	relative angular speed	gaze
0	1	2	3	4	5	6	7	8	9	10	11	12	13

Details of the field values:

One hot encodings of the object.

The one hot encodings are as follows:
- human: [0, 1, 0, 0, 0, 0]
- table: [0, 0, 1, 0, 0, 0]
- laptop: [0, 0, 0, 1, 0, 0]
- plant: [0, 0, 0, 0, 1, 0]
- wall: [0, 0, 0, 0, 0, 1]
x, y coordinates relative to the robot. For rectangular shaped objects the coordinates would correspond to the center of geometry.
theta : The orientation with respect to the robot
radius: Radius of the object. Rectangular objects will contain the radius of the circle that circumscribes the rectangle
relative translational speed is the magnitude of relative velocity of the entity with respect to the robot
relative angular speed is calculated by the difference in the angles across two consecutive time steps and dividing by the time-step
gaze value: for humans, it is 1 if the robot lies in the line of sight of humans, otherwise 0. For entities other than humans, the gaze value is 0. Line of sight of the humans is decided by whether the robot lies from -gaze_angle/2 to +gaze_angle/2 in the human frame. Gaze angle can be changed by changing the gaze_angle parameter in the config file.

The observation vector of the all entities of the same type would be concatenated into a single vector and that would be placed in the corresponding key in the dictionary. For example, let's say there are 4 humans, then the four vectors of shape (14,) would be concatenated to (56,) and the "humans" key in the observation dictionary would contain the vector of size (56, ). Individual observations can be accessed by simply reshaping the observation to (-1, 14).

For walls, each wall is segmented into smaller walls of size wall_segment_size (can be found the config). Observations from each segment are returned in obs["walls"]

The observation space of the environment can be obtained by calling env.observation_space.

Action Space

The action space for holonomic robot consists of three components, vx, vy, and va. Here the X axis is the robot's heading direction. For differential drive robots, the component vy would be 0. You can control the type of the robot using the config file's robot_type parameter. All the three components take in a value between -1 and 1, which will be later mapped to the corresponding speed by using the maxima set in the config file. If you want to use a discrete action space, you could use the DiscreteActions wrapper.

Info Dict

The environment also returns meaningful metrics at every step in an episode. The following table describes each metric that is returned in the info dict.

Metric	Description
"OUT_OF_MAP"	Boolean value that indicates whether the robot went out of the map
"COLLISION_HUMAN"	Boolean value that indicates whether the robot collided with a human
"COLLISION_OBJECT"	Boolean value that indicates whether the robot collided with an object
"COLLISION_WALL"	Boolean value that indicates whether the robot collided with a wall
"COLLISION"	Boolean value that indicates whether the robot collided with any entity
"SUCCESS"	Boolean value that indicates whether the robot reached the goal
"TIMEOUT"	Boolean value that indicates whether the episode has terminated due maximum steps
"FAILURE_TO_PROGRESS"	The number of timesteps that the robot failed to reduce the distance to goal
"STALLED_TIME"	The number of timesteps that the robot's velocity is 0
"TIME_TO_REACH_GOAL"	Number of time steps taken by the robot to reach its goal
"STL"	Success weighted by time length
"SPL"	Success weighted by path length
"PATH_LENGTH"	Total path length covered by the robot
"V_MIN"	Minimum velocity that the robot has achieved
"V_AVG"	Average velocity of the robot
"V_MAX"	Maximum velocity that robot has achieved
"A_MIN"	Minimum acceleration that the robot has achieved
"A_AVG"	Average acceleration of the robot
"A_MAX"	Maximum acceleration that robot has achieved
"JERK_MIN"	Minimum jerk that the robot has achieved
"JERK_AVG"	Average jerk of the robot
"JERK_MAX"	Maximum jerk that robot has achieved
"TIME_TO_COLLISION"	Minimum time to collision with a human agent at any point in time in the trajectory, should all robots and humans move in a linear trajectory
"MINIMUM_DISTANCE_TO_HUMAN"	Minimum distance to any human
"PERSONAL_SPACE_COMPLIANCE"	Percentage of steps that the robot is not within the personal space (0.45m) of any human
"MINIMUM_OBSTACLE_DISTANCE"	Minimum distance to any object
"AVERAGE_OBSTACLE_DISTANCE"	Average distance to any object

Some additional metrics that are also provided are :

Metric	Description
"DISCOMFORT_SNGNN"	SNGNN_value (More about SNGNN in the section below)
"DISCOMFORT_DSRNN"	DSRNN reward value (More about DSRNN reward function in the section below)
"sngnn_reward"	SNGNN_value - 1
"distance_reward"	Value of the distance reward

Note that the above 4 values are returned correctly if the reward function parameter in the config file is "dsrnn" or "sngnn". If a custom reward function is written, then the user is required to fill the above values otherwise 0s would be returned for them. For more information refer to Writing Custom Reward Functions.

Lastly, information about the interactions is returned as an adjacency list. There are two types of interactions, "human-human" and "human-laptop". For every interaction between human i and human j (i and j are the based on the order in which the human's observations appear in the observation dictionary. So to extract the i^th human's observation, you could just do obs["humans"].reshape(-1, 14)[i]), the tuple (i, j) and (j, i) would be present in info["interactions"]["human-human"], and similarly for an interaction between the i^th human and the j^th laptop, the tuple (i, j) would be present in info["interactions"]["human-laptop"]. Again, j is based on the order in which the laptops appear in the observation. To make it more clear, let's consider an example. Consider 4 humans, 1 table and two laptops in the environment and no walls. Also, two among the 4 humans are interacting with each other, and one other human is interacting with a laptop. For this scenario, the observation returned would be like this:

obs_dict = {
    "humans": [obs_human0, obs_human1, obs_human2, obs_human3],  # human observations stacked in a 1D array
    "tables": [obs_table0],  # table observation
    "laptops": [obs_laptop0, obs_laptop1]  # laptop observations stacked in a 1D array.
}

Let's say the humans with observations obs_human1 and obs_human2 are the ones who are interacting. Also, the human whose observation is obs_human3 interacts with the laptop which has an observation obs_laptop1. For such a case, the info dict for interactions would look like this:

info = {
    "interactions": {
        "human-human": [(1, 2), (2, 1)],
        "human-laptop": [(3, 1)]
    }
    ...  # rest of the info dict
}

Reward Function

The environment provides implementation of the SNGNN reward function, and the DSRNN reward function. If you want to use these reward functions, the config passed to the environment should have the value corresponding to the field reward_file as "sngnn" or "dsrnn" respectively.

The environment also allows users to provide custom reward functions. Follow the guide below to create your own reward function.

Writing Custom Reward Functions

Create a new python file in which you have to create a class named Reward. It must inherit from RewardAPI class. To do this, do the following
```
from socnavgym.envs.rewards import RewardAPI

class Reward(RewardAPI):
    ...
```
Overwrite the function compute_reward with the custom reward function. The input of the compute_reward function is the action of the current timestep, the previous entity observations and the current entity observations. The previous and current observations are given as a dictionary with key as the id of the entity, and the value is an instance of the EntityObs namedtuple defined in this file. It contains the fields : id, x, y, theta, sin_theta, cos_theta for each entity in the environment. Note that all these values are in the robot's frame of reference.
If need be, you can also access the lists of humans, plants, interactions etc, that the environment maintains by referencing the self.env variable. An example of this can be found in the dsrnn_reward.py file
The RewardAPI class provides four helper functions - check_collision, check_timeout, check_reached and check_out_of_map. These functions are boolean functions that check if the robot has collided with any enitity, whether the maximum episode length has been reached, whether the robot has reached the goal, or if the robot has moved out of the map respectively. The last case can occur only when the envirnoment is configured to have no walls.
The RewardAPI class also has a helper function defined to compute the SNGNN reward function. Call compute_sngnn_reward(actions, prev_obs, curr_obs) to compute the SNGNN reward. Also note that if you are using the SNGNN reward function in your custom reward function, please set the variable self.use_sngnn to True.
You can also store any additional information that needs to be returned in the info dict of step function by storing all of it in the variable self.info of the Reward class.
Storing anything in a class variable will persist across the steps in an episode. After every episode, the reward class object's __init__() method would be invoked.
Provide the path to the file where you defined your custom reward function in the config file's reward_file.

Config File

The behaviour of the environment is controlled using the config file. The config file needs to be passed as a parameter while doing gym.make. These are the following parameters and their corresponding descriptions

	Parameter	Description
rendering	resolution_view	size of the window for rendering the environment
rendering	milliseconds	delay parameter for waitKey()
episode	episode_length	maximum steps in an episode
episode	time_step	number of seconds that one step corresponds to
robot	robot_radius	radius of the robot
	goal_radius	radius of the robot's goal
	robot_type	Accepted values are "diff-drive" (for differential drive robot) and "holonomic" (for holonomic robot)
human	human_diameter	diameter of the human
	human_goal_radius	radius of human's goal
	human_policy	policy of the human. Can be "random", "sfm", or "orca". If "random" is chosen, then one of "orca" or "sfm" would be randomly chosen
	gaze_angle	gaze value (in the observation) for humans would be set to 1 when the robot lies between -gaze_angle/2 and +gaze_angle/2
	fov_angle	the frame of view for humans
	prob_to_avoid_robot	the probability that the human would consider the robot in its policy
laptop	laptop_width	width of laptops
laptop	laptop_length	length of laptops
plant	plant_radius	radius of plant
table	table_width	width of tables
table	table_length	length of tables
wall	wall_thickness	thickness of walls
human-human-interaction	interaction_radius	radius of the human-crowd
	interaction_goal_radius	radius of the human-crowd's goal
	noise_varaince	a random noise of normal(0, noise_variance) is applied to the humans' speed to break uniformity
human-laptop-interaction	human_laptop_distance	distance between human and laptop
env	margin	margin for the env
	max_advance_human	maximum speed for humans
	max_advance_robot	maximum linear speed for the robot
	max_rotation	maximum rotational speed for robot
	wall_segment_size	size of the wall segment, used when segmenting the wall
	speed_threshold	speed below which would be considered 0 (for humans)
	crowd_dispersal_probability	probability of crowd dispersal
	human_laptop_dispersal_probability	probability to disperse a human-laptop-interaction
	crowd_formation_probability	probability of crowd formation
	human_laptop_formation_probability	probability to form a human-laptop-interaction
	reward_file	Path to custom-reward file. If you want to use the in-built SNGNN reward function or the DSRNN reward function, set the value to "sngnn" or "dsrnn" respectively
	cuda_device	cuda device to use (in case of multiple cuda devices). If cpu or only one cuda device, keep it as 0
	min_static_humans	minimum no. of static humans in the environment
	max_static_humans	maximum no. of static humans in the environment
	min_dynamic_humans	minimum no. of dynamic humans in the environment
	max_dynamic_humans	maximum no. of dynamic humans in the environment
	min_tables	minimum no. of tables in the environment
	max_tables	maximum no. of tables in the environment
	min_plants	minimum no. of plants in the environment
	max_plants	maximum no. of plants in the environment
	min_laptops	minimum no. of laptops in the environment
	max_laptops	maximum no. of laptops in the environment
	min_h_h_dynamic_interactions	minimum no. of dynamic human-human interactions in the env. Note that these crowds can disperse if the parameter crowd_dispersal_probability is greater than 0
	max_h_h_dynamic_interactions	maximum no. of dynamic human-human interactions in the env. Note that these crowds can disperse if the parameter crowd_dispersal_probability is greater than 0
	min_h_h_dynamic_interactions_non_dispersing	minimum no. of dynamic human-human interactions in the env. Note that these crowds never disperse, even if the parameter crowd_dispersal_probability is greater than 0
	max_h_h_dynamic_interactions_non_dispersing	maximum no. of dynamic human-human interactions in the env. Note that these crowds never disperse, even if the parameter crowd_dispersal_probability is greater than 0
	min_h_h_static_interactions	minimum no. of static human-human interactions in the env. Note that these crowds can disperse if the parameter crowd_dispersal_probability is greater than 0
	max_h_h_static_interactions	maximum no. of static human-human interactions in the env. Note that these crowds can disperse if the parameter crowd_dispersal_probability is greater than 0
	min_h_h_static_interactions_non_dispersing	minimum no. of static human-human interactions in the env. Note that these crowds never disperse, even if the parameter crowd_dispersal_probability is greater than 0
	max_h_h_static_interactions_non_dispersing	maximum no. of static human-human interactions in the env. Note that these crowds never disperse, even if the parameter crowd_dispersal_probability is greater than 0
	min_human_in_h_h_interactions	minimum no. of humans in a human-human interaction
	max_human_in_h_h_interactions	maximum no. of humans in a human-human interaction
	min_h_l_interactions	minimum no. of human-laptop interactions in the env. Note that these crowds can disperse if the parameter human_laptop_dispersal_probability is greater than 0
	max_h_l_interactions	maximum no. of human-laptop interactions in the env. Note that these crowds can disperse if the parameter human_laptop_dispersal_probability is greater than 0
	min_h_l_interactions_non_dispersing	minimum no. of human-laptop interactions in the env. Note that these crowds never disperse, even if the parameter human_laptop_dispersal_probability is greater than 0
	max_h_l_interactions_non_dispersing	maximum no. of human-laptop interactions in the env. Note that these crowds never disperse, even if the parameter human_laptop_dispersal_probability is greater than 0
	get_padded_observations	flag value that indicates whether you require padded observations or not. You can change it using env.set_padded_observations(True/False)
	set_shape	Sets the shape of the environment. Accepted values are "random", "square", "rectangle", "L" or "no-walls"
	add_corridors	True or False, whether there should be corridors in the environment
	min_map_x	minimum size of map along x direction
	max_map_x	maximum size of map along x direction
	min_map_y	minimum size of map along y direction
	max_map_y	maximum size of map along y direction

Wrappers

Gym wrappers are convenient to have changes in the observation-space / action-space. SocNavGym implements 4 wrappers.

The following are the wrappers implemented by SocNavGym:

DiscreteActions : To change the environment from a continuous action space environment to a discrete action space environment. The action space consists of 7 discrete actions. They are :

Turn anti-clockwise (0)
Turn clock-wise (1)
Turn anti-clockwise and moving forward (2)
Turning clockwise and moving forward (3)
Move forward (4)
Move backward (5)
Stay still (6)

As an example, to make the robot move forward throughout the episode, just do the following:

import socnavgym
from socnavgym.wrappers import DiscreteActions

env = gym.make("SocNavGym-v1", config="environment_configs/exp1_no_sngnn.yaml")  # you can pass any config
env = DiscreteActions(env)  # creates an env with discrete action space

# simulate an episode with random actions
done = False
env.reset()
while not done:
    obs, rew, terminated, truncated, info = env.step(4)  # 4 is for moving forward 
    done = terminated or truncated
    env.render()

NoisyObservations : This wrapper can be used to add noise to the observations so as to emulate real world sensor noise. The parameters that the wrapper takes in are mean, std_dev. Apart from this, there is also a parameter called apply_noise_to which defaults to [robot", "humans", "tables", "laptops", "plants", "walls"], meaning all enitity types. If you want to apply noise to only a few entity types, then pass a list with only those entity types to this parameter. The noise value can be controlled using the mean and the std_dev parameters. Basically, a Gaussian noise with mean and std_dev is added to the observations of all the entities whose entity type is listed in the parameter apply_noise_to. As an example, to add a small noise with 0 mean and 0.1 std dev to all entity types do the following:
```
import socnavgym
from socnavgym.wrappers import NoisyObservations

env = gym.make("SocNavGym-v1", config="environment_configs/exp1_no_sngnn.yaml")  # you can pass any config
env = NoisyObservations(env, mean=0, std_dev=0.1)

# simulate an episode with random actions
done = False
env.reset()
while not done:
    obs, rew, terminated, truncated, info = env.step(env.action_space.sample())  # obs would now be a noisy observation

    done = terminated or truncated
    env.render()
```

PartialObservations : This wrapper is used to return observations that are present in the frame of view of the robot, and also that lies within the range. Naturally, the wrapper takes in two parameters fov_angle and the range. An example of using the PartialObservations wrapper:

import socnavgym
from socnavgym.wrappers import PartialObservations
from math import pi

env = gym.make("SocNavGym-v1", config="environment_configs/exp1_no_sngnn.yaml")  # you can pass any config
env = PartialObservations(env, fov_angle=2*pi/3, range=1)  # creates a robot with a 120 degreee frame of view, and the sensor range is 1m.

# simulate an episode with random actions
env.reset()
done = False
while not done:
    obs, rew, terminated, truncated, info = env.step(env.action_space.sample())
    done = terminated or truncated
    env.render()

WorldFrameObservations : Returns all the observations in the world frame. The observation space of the "robot" would look like this:

Encoding						Robot Goal coordinates		Robot coordinates		Angular Details		Velocities Speeds			Radius
enc0	enc1	enc2	enc3	enc4	enc5	goal_x	goal_y	x	y	sin(theta)	cos(theta)	vel_x	vel_y	vel_a	radius
0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15

The other enitity observations would remain the same, the only difference being that the positions and velocities would be in the world frame of reference and not in the robot's frame of reference.

An example of using the WorldFrameObservations wrapper:

import socnavgym
from socnavgym.wrappers import WorldFrameObservations
from math import pi

env = gym.make("SocNavGym-v1", config="environment_configs/exp1_no_sngnn.yaml")  # you can pass any config
env = WorldFrameObservations(env) 

# simulate an episode with random actions
env.reset()
done = False
while not done:
    obs, rew, terminated, truncated, info = env.step(env.action_space.sample())  # obs contains observations that are in the world frame 
    done = terminated or truncated
    env.render()

Training Agents

The script to train the agents is stable_dqn.py. This is an implementation of DuelingDQN using StableBaselines3. We use Comet ML for logging, so please create an account before proceeding. It is completely free of cost. Run the following commands to reproduce our results on the experiments mentioned in the paper:

Experiment 1 (Using DSRNN Reward)

python3 stable_dqn.py -e="./environment_configs/exp1_no_sngnn.yaml" -r="dsrnn_exp1" -s="dsrnn_exp1" -d=False -p=<project_name> -a=<api_key>

Experiment 1 (Using SNGNN Reward)

python3 stable_dqn.py -e="./environment_configs/exp1_with_sngnn.yaml" -r="sngnn_exp1" -s="sngnn_exp1" -d=False -p=<project_name> -a=<api_key>

Experiment 2 (Using DSRNN Reward)

python3 stable_dqn.py -e="./environment_configs/exp2_no_sngnn.yaml" -r="dsrnn_exp2" -s="dsrnn_exp2" -d=False -p=<project_name> -a=<api_key>

Experiment 2 (Using SNGNN Reward)

python3 stable_dqn.py -e="./environment_configs/exp2_with_sngnn.yaml" -r="sngnn_exp2" -s="sngnn_exp2" -d=False -p=<project_name> -a=<api_key>

Experiment 3 (Using DSRNN Reward)

python3 stable_dqn.py -e="./environment_configs/exp3_no_sngnn.yaml" -r="dsrnn_exp3" -s="dsrnn_exp3" -d=False -p=<project_name> -a=<api_key>

Experiment 3 (Using SNGNN Reward)

python3 stable_dqn.py -e="./environment_configs/exp3_with_sngnn.yaml" -r="sngnn_exp3" -s="sngnn_exp3" -d=False -p=<project_name> -a=<api_key>

In general, the stable_dqn script can be used as follows:


usage: python3 stable_dqn.py [-h] -e ENV_CONFIG -r RUN_NAME -s SAVE_PATH -p
                 PROJECT_NAME -a API_KEY [-d USE_DEEP_NET] [-g GPU]

optional arguments: -h, --help show this help message and exit -e ENV_CONFIG, --env_config ENV_CONFIG path to environment config -r RUN_NAME, --run_name RUN_NAME name of comet_ml run -s SAVE_PATH, --save_path SAVE_PATH path to save the model -p PROJECT_NAME, --project_name PROJECT_NAME project name in comet ml -a API_KEY, --api_key API_KEY api key to your comet ml profile -d USE_DEEP_NET, --use_deep_net USE_DEEP_NET True or False, based on whether you want a transformer based feature extractor -g GPU, --gpu GPU gpu id to use


## Evaluating Agents
The evaluation script for the Dueling DQN agent using StableBaselines3 can be found in `sb3_eval.py`. 
```bash
usage: python3 sb3_eval.py [-h] -n NUM_EPISODES -w WEIGHT_PATH -c CONFIG

optional arguments:
  -h, --help            show this help message and exit
  -n NUM_EPISODES, --num_episodes NUM_EPISODES
                        number of episodes
  -w WEIGHT_PATH, --weight_path WEIGHT_PATH
                        path to weight file
  -c CONFIG, --config CONFIG
                        path to config file

Manually Controlling the Robot

You can control the robot using a joystick and also record observations, actions and rewards. To do this, run the manual_control_js.py.

usage: python3 manual_control_js.py [-h] -n NUM_EPISODES [-j JOYSTICK_ID] [-c CONFIG] [-r RECORD] [-s START]

optional arguments:
  -h, --help            show this help message and exit
  -n NUM_EPISODES, --num_episodes NUM_EPISODES
                        number of episodes
  -j JOYSTICK_ID, --joystick_id JOYSTICK_ID
                        Joystick identifier
  -c CONFIG, --config CONFIG
                        Environment config file
  -r RECORD, --record RECORD
                        Whether you want to record the observations, and actions or not
  -s START, --start START
                        starting episode number

gnns4hri / SocNavGym

readme