CodeReclaimers / neat-python

Python implementation of the NEAT neuroevolution algorithm
BSD 3-Clause "New" or "Revised" License
1.41k stars 490 forks source link

Unable to train a car to traverse a track #128

Open puneets2811 opened 6 years ago

puneets2811 commented 6 years ago

Car specs - I've modeled the car using unity and interfaced it with python using Unity ML Agents(https://github.com/Unity-Technologies/ml-agents). The car has 5 radar sensors mounted on the front bumper each of which calculates distance to nearest obstacle. The car has 2 controls, Accelerate (with range [-1, 1]), and steer (with range [-1, 1]). I've set the configuration file for 5 inputs and 2[EDIT] outputs and set the activation to tanh.

Fitness function - The fitness for a genome is calculated by running the car in the track and recording a distance metric based on acceleration values. (More detail about how fitness is calculated at the end)


Problem - During training what happens is, the car learns not to go backward, but does not learn to avoid obstacle or traverse the path. The fitness values of all genomes is always from a fixed set of float values. After enough iterations all the genomes tend to have the same fitness values and it goes on without stopping.

I couldn't get what the problem is, is it a problem with modelling of the car or is it that the configuration parameters are sub standard.

How fitness is calculated detailed - Consider 5 time steps of environment with acceleration values at each as [a1, a2, a3, a4, a5], for all -1 <= ai <= 1, and initial velocity, v0 = 0. First time step - Distance covered in first time step s1 = v0 t + (1/2) a1 t ^2 = a1 / 2 [t = 1 time step] Velocity at the end of first time step v1 = v0 + a1 t = a1 * 1 = a1

Distance covered in second time step s2 = v1 t + (1/2) a2 t ^2 = v1 + a2 / 2 = a1 + a2/2 Velocity at the end of second time step v2 = v1 + a2 t = v1 + a2 * 1 = v1 + a2 = a1 + a2

Distance covered in third time step s3 = v2 t + (1/2) a3 t ^2 = (a1 + a2) t + (1/2) a3 t^2 = (a1 + a2) + (1/2) a3 = a1 + a2 + a3/2 Velocity at the end of third time step v3 = v2 + a3 t = a1 + a2 + a3 ... so on for all time steps Used the following equations of motion s = ut + (1/2)at^2 v = u + at

evolvingfridge commented 6 years ago

It looks like your issue is with fitness function. I would try something simpler, like max distance traveled per each generation.

abrahamrhoffman commented 6 years ago

@puneets2811 , I am also interested to hear if the change in fitness function impacts the long-term growth of your cars solving the track. Keep us in the loop!

puneets2811 commented 6 years ago

@abrahamrhoffman I'd like to know what kind of change in the fitness function are you mentioning.

The only fitness function that I've used calculates a metric directly proportional to actual distance traveled by the car before it crashes. I'd like to mention that the car only learns that it doesn't have to go backward but it does not learn to steer.

I've tried tanh and clamped (between -1 & 1) activation functions here. The clamped one performs totally random actions.

puneets2811 commented 6 years ago

@evolvingfridge I couldn't completely get how max distance traveled per each generation can be modeled. The better genomes propagate in the next generations automatically.

evolvingfridge commented 6 years ago

@puneets2811 Each genome (Car) before crash will travel N meters forward (+) or backward (-), genomes with highest positive distance traveled are your best genomes that are selected for reproduction from current population. Also double check your sensors and steering code that there is no bugs (happened in past with me not once) :smile: