Refactor Q-Learning code

Some changes to the Q-Learning code that make it more readable and more maintainable

Update every time a lidar scan is received, instead of using a timer
Reuse more code between the training and the driving node
Only use the taken actions for the neural net loss function, instead of all of them
Improve time measurement and debug output

This was developed together with improvements to make the car drive faster with Q-learning. Since that doesn't work yet, I'm submitting only the refactoring part as a pull request for now.

Autonomous-Racing-PG / ar-tu-do

Refactor Q-Learning code #271