Closed DanieleLiuni closed 3 weeks ago
Which reward function are you using, for
Ok, I get the point. I ask you about this because in my simulation I have a strange behaviour.I'm using the following reward: lambda1 = -2 reward = lambda1d_EE if d_EE < self.distance_threshold: reward = 5 - 100d_EE where self.distance_threshold is equal to 0.03. In this way I obtain a positive reward when I "achieve the goal" (going into the safe zone) but I provide also a gradient to continue moving toward the target point. I don't know why but my body, after it reaches the target, starts to move back and forth at maximum action around the target (going in and out the target zone).
Send a video of your resulting behavior
And what is d_EE
d_EE is the distance between the white sphere and the target (red sphere). I considered the action: self.action_space = spaces.Box(-0.04, 0.04, shape=(self.n_actions,), dtype="float32"). So, It can perform every action between -0.04 and +0.04. After the goal is reached, it starts oscillating : pratically, it continues to move applying before action +0.04 and at the next step the action = -0.04, then again +0.04 and so on.
This looks to me like a simulation error, you will have to ask the MuJoCo developers
From Fetch environments I observe that, even if the target position is reached, the mocap (and the robot) continues to move around the target. Is it only related to the number of learning timesteps ? Or something about how mocap works ? I also have this problem in my simulation, in which I want to reach a target point minimizing oscillations of a DLO. If you have any hints about this particular behaviour, please let me know.