aleju / self-driving-truck

Self-Driving Truck in Euro Truck Simulator 2, trained via Reinforcement Learning
MIT License
389 stars 96 forks source link
autonomous deep-learning pytorch reinforcement-learning self-driving-car

About

This repository contains code to train and run a self-driving truck in Euro Truck Simulator 2. The resulting AI will automatically steer, accelerate and brake. It is trained (mostly) via reinforcement learning and only has access to the buttons W, A, S and D (i.e. it can not directly set the steering wheel angle).

Example video:

Example video

Architecture and method

The basic training method follows the standard reinforcement learning approach from the original Atari paper. Additionally, a separation of Q-values in V (value) and A (advantage) - as described in Dueling Network Architectures for Deep Reinforcement Learning - is used. Further, the model tries to predict future states and rewards, similar to the description in Deep Successor Reinforcement Learning. (While that paper uses only predictions for the next timestep, here predictions for the next T timesteps are generated via an LSTM.) To make training faster, a semi-supervised pretraining is applied to the first stage of the whole model (similar to Loss is its own Reward: Self-Supervision for Reinforcement Learning, though here only applied once at the start). That training uses some manually created annotations (e.g. positions of cars and lanes in example images) as well as some automatically generated ones (e.g. canny edges, optical flow).

Architecture visualization:

Architecture

There are five components:

Aside from these, there is also an autoencoder component applied to the embeddings of Embedder 2. However, that component is only trained for some batches, so it is skipped here.

During application, each game state (i.e. frame/screenshot at 10fps) is embedded via convolutions and fully connected layers to a vector. From that vector, future embeddings (the successors) are predicted. Each such prediction (for each timestep) is dependent on a chosen action (e.g. pressing W+A followed by two times W converts game state vector X into Y). For 9 possible actions (W, W+A, W+D, S, S+A, S+D, A, D, none) and 10 timesteps (i.e. looking 1 second into the future), this leads to roughly 3.5 billion possible chains of actions. This number is decreased to roughly 400 sensible plans (e.g. 10x W, 10x W+A, 3x W+A followed by 7x W, ...). For each such plan, the successors are generated and rewards are predicted (which can be done reasonably fast as the embedding's vector size is only 512). The plans are ordered/ranked by the V-values of their last timesteps and the plan with the highest V-value is chosen. (The predicted direct rewards of successors are currently ignored in the ranking, which seemed to improve the driving a bit.)

Reward Function

For a chain (s, a, r, s'), the reward r is mainly dependent on the measured speed at s'. The formula is r = sp*rev + o + d, where

The speed is read out from the game screen (it is shown in the route advisor). Similarly, offences and damages can be recognized using simple pixel comparisons or instance matching in the area of the route advisor (both events lead to shown messages).

Difficulties

ETS2 is a harder game to play (for an AI) than it may seem at first glance. Some of the difficulties are:

And of course on top of these things, the standard problems with window handling occur (e.g. where is the window; what pixel content does it show; is the game paused; at what speed is the truck driving; how to send keypresses to the window etc.). Also, the whole architecture has to run in (almost-)realtime (theoretically here max 100ms per state, but better <=50ms so that actions have a decent effect before observing the next screen).

Limitations

The AI can -- to a degree -- drive on streets that have solid, continuous objects on both sides. On such roads, hitting the side is usually no death sentence as the truck is deflected from the wall and can go on driving (albeit damaged). That is in contrast to e.g. streets without any objects on the side, where the AI can drive off the street and then get stuck on some tiny hill/object or run into an invisible wall. As a consequence, the AI is best at driving on highways, which usually have such walls or railings on both sides (and are often quite wide). However, it will not care about lanes and not that much about other cars (but it seems to recognize them). In general, the AI's driving capabilities are still far away from the ones of humans.

Typical problems of the AI are:

Note that all of this is based on the results after a few days of training. More training might yield better behavior.

Usage

Requirements and dependencies

Hardware

System

Python libraries

Install

Game Configuration

Apply model

Run full training