[Question] Mocap not still at the end

Farama-Foundation / Gymnasium-Robotics

A collection of robotics simulation environments for reinforcement learning

https://robotics.farama.org/

MIT License

485 stars 79 forks source link

[Question] Mocap not still at the end #219

Closed DanieleLiuni closed 3 weeks ago

DanieleLiuni commented 3 weeks ago

From Fetch environments I observe that, even if the target position is reached, the mocap (and the robot) continues to move around the target. Is it only related to the number of learning timesteps ? Or something about how mocap works ? I also have this problem in my simulation, in which I want to reach a target point minimizing oscillations of a DLO. If you have any hints about this particular behaviour, please let me know.

Kallinteris-Andreas commented 3 weeks ago

Which reward function are you using, for

sparse: This is the expected behavior since no reward is given for stabilizing, just a boolean reward for being "achieving the objective"
dense: One would expect this to not happen, but in practice it still happens (to a lesser extent) due to the small gradients near the objective

DanieleLiuni commented 3 weeks ago

Ok, I get the point. I ask you about this because in my simulation I have a strange behaviour.I'm using the following reward: lambda1 = -2 reward = lambda1d_EE if d_EE < self.distance_threshold: reward = 5 - 100d_EE where self.distance_threshold is equal to 0.03. In this way I obtain a positive reward when I "achieve the goal" (going into the safe zone) but I provide also a gradient to continue moving toward the target point. I don't know why but my body, after it reaches the target, starts to move back and forth at maximum action around the target (going in and out the target zone).

Kallinteris-Andreas commented 3 weeks ago

Send a video of your resulting behavior And what is d_EE

DanieleLiuni commented 3 weeks ago

oscillations.webm

d_EE is the distance between the white sphere and the target (red sphere). I considered the action: self.action_space = spaces.Box(-0.04, 0.04, shape=(self.n_actions,), dtype="float32"). So, It can perform every action between -0.04 and +0.04. After the goal is reached, it starts oscillating : pratically, it continues to move applying before action +0.04 and at the next step the action = -0.04, then again +0.04 and so on.

Kallinteris-Andreas commented 3 weeks ago

This looks to me like a simulation error, you will have to ask the MuJoCo developers