Inside the gym environment, there are two robot speed: self.speed and self.robot_speed; while self.robot_speed is set to a constant, self.speed is the true speed. Yet in the reward function, the function calls self.robot_speed instead of self.speed (check this). I think this creates the reward mis-specification problem (i.e. DDPG learns trivial policy). Can one of the repo creators check if this is indeed an error? Thanks! (I just restarted my run and will check if this solve the issue.)
Inside the gym environment, there are two robot speed: self.speed and self.robot_speed; while self.robot_speed is set to a constant, self.speed is the true speed. Yet in the reward function, the function calls self.robot_speed instead of self.speed (check this). I think this creates the reward mis-specification problem (i.e. DDPG learns trivial policy). Can one of the repo creators check if this is indeed an error? Thanks! (I just restarted my run and will check if this solve the issue.)