Different multipliers are used for positive and the negative reward.
Both rewards start at x1 but can increase or decrease as the rover get's closer or further
positive_multiplier increases as the rover get's closer to destination with fewer steps taken
negative-multiplier increases as the rover takes more and more steps or it doesn't get closer to destination
The robot tries to determine the position of the object on a graph relative to it by calculating the distance it moved vs the distance the object got closer by.
Based on the distance to an object, a reward is awarded based on the following figure.
The less the x and y value the harsher the reward. Anything outside the green zone is ignored
This is the rewards code used to train the model submitted on https://virtual.hackathon.io/
Calculations
x1
but can increase or decrease as the rover get's closer or furtherpositive_multiplier
increases as the rover get's closer to destination with fewer steps takennegative-multiplier
increases as the rover takes more and more steps or it doesn't get closer to destination