FazelYU/Adaptive-Navigation

Abstract

Traffic congestion in urban road networks is a condition characterised by slower speeds, longer trip times, increased air pollution, and driver frustration. Traffic congestion can be attributed to a volume of traffic that generates demand for space greater than the available street capacity. A number of other specific circumstances can also cause or aggravate congestion, including traffic incidents, road maintenance work and bad weather conditions.

While construction of new road infrastructure is an expensive solution, traffic flow optimization using route planning algorithms is considered a more economical and sustainable alternative. Currently, well-known publicly available car navigation services, such as Google Maps and Waze, help people with route planning. These systems mainly rely on variants of the popular Shortest Path First (SPF) algorithm to suggest a route, assuming a static network. However, road network conditions are dynamic, rendering the SPF route planning algorithms to perform sub-optimally at times. In addition, SPF is a greedy algorithm. So, while it can yield locally optimal solutions that approximate a globally optimal solution in a reasonable amount of time, it does not always produce an optimal solution. For example, in a limited road capacity, the SPF routing algorithm can cause congestion by greedily routing all vehicles to the same road (towards the shortest path).

To address limitations and challenges of the current approach to solve the traffic congestion problem, we propose a network-aware multi-agent reinforcement learning (MARL) model for the navigation of a fleet of vehicles in the road network. The proposed model is adaptive to the current traffic conditions of the road network. The main idea is that a Reinforcement Learning (RL) agent is assigned to every road intersection and operates as a router agent, responsible for providing routing instructions to a vehicle in the network. The vehicle traveling in the road network is aware of its final destination but not its exact full route/path to it. When it reaches an intersection, it generates a routing query to the RL agent assigned to that intersection, consisting of its final destination. The RL agent generates a routing response based on (i) the vehicle’s destination, (ii) the current state of the network in the neighborhood of the agent aggregated with a shared graph attention network (GAT) model, and (iii) routing policies learned by cooperating with other RL agents assigned to neighboring intersections. The vehicle follows the routing response from the router agents until it reaches its destination.

Through an extensive experimental evaluation on both synthetic and realistic road networks we demonstrate that the proposed MARL model can outperform the SPF algorithm by (up to) 20\% in average travel time.

Example:

The figure below gives an example of routing of two vehicles with two different destinations using the MARL model:

alt text

Model Architecure:

The figure below shows the archeticture of the Adaptive Navigation algorithm: alt text

Reslts:

The figures below show two test cases, the 5x6 grid network, and the abstracted network of DT Toronto extracted from Open Streets Map :

Fig.1.1 - DT Toronto

Fig.1.2 - 5x6 Grid.

The figures below show the training results of the two test cases. Q-routing is a deep learning implementaion of Boyan, and Litman IP network routing algorithm. The number of GAT layers in the Adaptive Navigation algorithm is a hyper parameter denoted as "h".

Fig.2.1 - Toronto Average Travel Time

Fig.2.2 - Toronto Average Travel Time Zoom

Fig.2.3 - 5x6 Average Travel Time

Fig.2.4 - 5x6 Average Travel Time Zoom

After the training is complete, for testing purposes, we generate 2000 uniformly distributed trips in each test case. We also consider two sensible baselines: 1- SPF (travel time shortest path first), and SPFWR (travel time shortest path first wih rerouting). Note that SPFWR is prohibitively computationally expensive as it needs to recompute all the shortest pathes in every time step. Table below compares the results of the testing phase:

	DT Toronto	5x6 Grid
AN (h=2)	479.3	145.4
AN (h=1)	476.4	138.4
AN (h=0)	477.6	143.7
Q-routing	inf (loop)	159.6
SPF	551.7	173.4
SPFWR	475.6	205.1

In DT Toronto AN(h=1) has been able to perform as good as the SPFWR.

On the other hand, counter-intutively, in the 5x6 grid network the SPFWR performs poorly. The reason for this observation can be rooted to the low capacity of the network. SPFWR greedily sends all the vehicles through the current shortest path and easily congests it. Moreover, SPFWR does not consider the waiting times in the queue of the traffic lights. As a result it may lead to unnecessary changes of the route that stuck further behind the traffic lights. In 4x5 network, AN(h=1) out performs other baselines.

The figure below, shows part of the simulation for the SPWR. One can see that the traffic in the network is increasing as the time pass by. Also, you can see that at times some roadsegments get overpopulated by the SPFWR algorithm.

alt text

How to run:

to train the model you need to specify three things: 1- gpu_number, to specify the gpu to train on (used for training several experiements on the server. You can choose it to be always 0) 2- algorithm_number [0=AN(2), 1=AN(1), 2=AN(0),3=Q-routing] 3- network_name [0="5x6 network",2="toronto"] To run the code :

  python run_exp.py gpu_number algorithm_number network_name

For example, to run Adaptive Navigation with 2 layers on toronto network, run:

  python run_exp.py 0 0 2

Refrence:

Accepted in SIGSPATIAL 2022: https://sigspatial2022.sigspatial.org/accepted-papers/: Network-aware Multi-agent Reinforcement Learning for the Vehicle Navigation Problem Fazel Arasteh, Soroush Sheikh Gargar, and Manos Papagelis