Open kxxwz opened 6 years ago
It's possible there could be an off-by-one error in the code that decides to stop when trials is reached.
Sometimes I have observed that the simulator doesn't respond to the first action and sometimes it takes a frame or two before state features are initialized with correct observations.
It seems that the last observation may be what happens when the field is reset for the next episode, rather than when the ball is in the goal.
Good observations. I would happily accept pull requests if you feel enthusiastic about fixing either problem.
Could I ask how did you deal with this problem about the last "weir" state features? In my opinion, since the last state is given wrongly, we shouldn't add the last transition into the replay buffer. However, as the last transition includes the important terminal state, we cannot just ignore it.
Yes, you are right. I just found that the last observation of each episode is just the same as the first observation of the next episode.
It may be possible to fix this by delaying the resetting of the field by one step.
On Tue, Aug 21, 2018 at 10:42 AM Hongjie notifications@github.com wrote:
Yes, you are right. I just found that the last observation of each episode is just the same as the first observation of the next episode.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LARG/HFO/issues/71#issuecomment-414760561, or mute the thread https://github.com/notifications/unsubscribe-auth/AABNOXSblXkQt3frtBQVZQfCThpYEPl_ks5uTEaEgaJpZM4WE-Wq .
Hi @mhauskn ! I met a quite weird phenomenon. I ran the following command to start server.
Then there will be 31 trials, with the first trial having 52 frames and the last invalid trial having 2 frames. If you set
-trials=n
, then there will always be n+1 trials. What's more weird, there is always a dramatic change in the last state of each episode. Take feature 53,Ball Dist [Proximity]
as an example, and here is data of two episodes both with 50 states.As the agent never kicks the ball, the last value must be wrong. I also plot the distance between goal and ball as follows. This episode ends with a GOAL. And the first action of the agent seems to have no impact on the environment. Have you ever met such problems? Any help is well appreciated!