LARG / HFO

Half Field Offense in Robocup 2D Soccer
MIT License
231 stars 93 forks source link

weird state feature at the last timestamp of an episode #71

Open kxxwz opened 5 years ago

kxxwz commented 5 years ago

Hi @mhauskn ! I met a quite weird phenomenon. I ran the following command to start server.

./bin/HFO --untouched-time=-1 --offense-agents=1 --port=6000 --frames-per-trial=50 --fullstate --trials=30 --headless

Then there will be 31 trials, with the first trial having 52 frames and the last invalid trial having 2 frames. If you set -trials=n, then there will always be n+1 trials. What's more weird, there is always a dramatic change in the last state of each episode. Take feature 53, Ball Dist [Proximity] as an example, and here is data of two episodes both with 50 states.

array([0.6643038 , 0.6643038 , 0.6718993 , 0.68713   , 0.7070354 ,
       0.7331321 , 0.75905573, 0.7708609 , 0.7824849 , 0.80540454,
       0.82609594, 0.83467364, 0.8523755 , 0.86551416, 0.88019145,
       0.8940145 , 0.91583693, 0.92692685, 0.93553746, 0.9396484 ,
       0.95106804, 0.95858157, 0.9627873 , 0.9741982 , 0.98768985,
       0.98736846, 0.9876889 , 0.9876902 , 0.98769045, 0.9876895 ,
       0.9876915 , 0.98768973, 0.987689  , 0.9876877 , 0.98768914,
       0.98697865, 0.975237  , 0.95919   , 0.9525161 , 0.94982505,
       0.9505818 , 0.9519601 , 0.97131395, 0.987689  , 0.98768973,
       0.9876896 , 0.98576045, 0.9811977 , 0.9668609 , 0.33604002],
      dtype=float32)
array([0.33604002, 0.33604002, 0.3362168 , 0.34301972, 0.3602388 ,
       0.3745818 , 0.39492488, 0.42026985, 0.43823254, 0.45506883,
       0.4697199 , 0.48749197, 0.5065907 , 0.5309875 , 0.5599154 ,
       0.58340025, 0.59793174, 0.6224247 , 0.6369306 , 0.6489608 ,
       0.6613959 , 0.67809916, 0.6956341 , 0.7131889 , 0.73507845,
       0.7612641 , 0.7831397 , 0.8053763 , 0.82398236, 0.84805846,
       0.86772907, 0.89409256, 0.91450524, 0.93750036, 0.94718933,
       0.9620631 , 0.9691874 , 0.97728264, 0.98402476, 0.9876324 ,
       0.987689  , 0.9876889 , 0.9876896 , 0.9791968 , 0.9572971 ,
       0.9483985 , 0.94483197, 0.9505987 , 0.958763  , 0.24727392],
      dtype=float32)

As the agent never kicks the ball, the last value must be wrong. I also plot the distance between goal and ball as follows. This episode ends with a GOAL. haha And the first action of the agent seems to have no impact on the environment. Have you ever met such problems? Any help is well appreciated!

mhauskn commented 5 years ago

It's possible there could be an off-by-one error in the code that decides to stop when trials is reached.

Sometimes I have observed that the simulator doesn't respond to the first action and sometimes it takes a frame or two before state features are initialized with correct observations.

It seems that the last observation may be what happens when the field is reset for the next episode, rather than when the ball is in the goal.

Good observations. I would happily accept pull requests if you feel enthusiastic about fixing either problem.

kxxwz commented 5 years ago

Could I ask how did you deal with this problem about the last "weir" state features? In my opinion, since the last state is given wrongly, we shouldn't add the last transition into the replay buffer. However, as the last transition includes the important terminal state, we cannot just ignore it.

kxxwz commented 5 years ago

Yes, you are right. I just found that the last observation of each episode is just the same as the first observation of the next episode.

mhauskn commented 5 years ago

It may be possible to fix this by delaying the resetting of the field by one step.

On Tue, Aug 21, 2018 at 10:42 AM Hongjie notifications@github.com wrote:

Yes, you are right. I just found that the last observation of each episode is just the same as the first observation of the next episode.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LARG/HFO/issues/71#issuecomment-414760561, or mute the thread https://github.com/notifications/unsubscribe-auth/AABNOXSblXkQt3frtBQVZQfCThpYEPl_ks5uTEaEgaJpZM4WE-Wq .