LARG / HFO

Half Field Offense in Robocup 2D Soccer
MIT License
231 stars 93 forks source link

Is there a way to make HFO deterministic? #67

Closed kxxwz closed 6 years ago

kxxwz commented 6 years ago

Hi @mhauskn , I set the same --seed for each game, used --fullstate, and took the same action at each timestamp. I got the same initial state for each game, but the following states were different. Here is the code I used for test:

state = hfo.getState() # initial state s_1
print(state)

hfo.act(DASH, 4, 0) # a_1
hfo.step()
state = hfo.getState() # s_2
print(state)

hfo.act(DASH, 4, 0) # a_2
hfo.step() 
state = hfo.getState() # s_3
print(state)

hfo.act(DASH, 4, 0) # a_4
hfo.step()
state = hfo.getState() # s_5
print(state)

The outputs for 3 games are as follows: GAME 1:

[-0.00783491  0.304353    0.         -0.60117686  0.309754   -1.
 -0.07788175 -0.12915528 -0.717829   -2.         -1.          1.        ]
[-0.00783491  0.304353    0.         -0.60117686  0.309754   -1.
 -0.07788175 -0.12915528 -0.717829   -2.          1.          1.        ]
[-0.00706983  0.30430746  0.         -0.60117686  0.309754   -1.
 -0.0786112  -0.12924314 -0.7176385  -2.          1.          1.        ]
[-0.00602221  0.30428338  0.         -0.60117686  0.309754   -1.
 -0.07959193 -0.12937814 -0.71738756 -2.          1.          1.        ]

GAME 2:

[-0.00783491  0.304353    0.         -0.60117686  0.309754   -1.
 -0.07788175 -0.12915528 -0.717829   -2.         -1.          1.        ]
[-0.00783491  0.304353    0.         -0.60117686  0.309754   -1.
 -0.07788175 -0.12915528 -0.717829   -2.          1.          1.        ]
[-0.00704449  0.30435562  0.         -0.60117686  0.309754   -1.
 -0.07861197 -0.12926483 -0.7176455  -2.          1.          1.        ]
[-0.00590158  0.3042941   0.         -0.60117686  0.309754   -1.
 -0.07969844 -0.12939876 -0.7173623  -2.          1.          1.        ]

GAME 3:

[-0.00783491  0.304353    0.         -0.60117686  0.309754   -1.
 -0.07788175 -0.12915528 -0.717829   -2.         -1.          1.        ]
[-0.00706983  0.30436897  0.         -0.60117686  0.309754   -1.
 -0.07858217 -0.12926644 -0.71765506 -2.          1.          1.        ]
[-0.00605714  0.30442786  0.         -0.60117686  0.309754   -1.
 -0.07949132 -0.12942815 -0.71743464 -2.          1.          1.        ]
[-0.0049333   0.3045268   0.         -0.60117686  0.309754   -1.
 -0.08048397 -0.12962067 -0.71719897 -2.          1.          1.        ]

I also noted another problem. At the initial state s_1, I let the agent take action DASH(20,0), but the following state s_2 is the same as s_1 in GAME 1 and GAME 2.

mhauskn commented 6 years ago

I was unable to make hfo deterministic. I think the underlying rcssserver is the source of the stochastic behavior.

On Fri, Jul 27, 2018, 6:30 PM Hongjie notifications@github.com wrote:

Hi @mhauskn https://github.com/mhauskn , I set the same --seed for each game, used --fullstate, and took the same action at each timestamp. I got the same initial state for each game, but the following states were different. Here is the code I used for test:

state = hfo.getState() # initial state s_1 print(state)

hfo.act(DASH, 4, 0) # a_1 hfo.step() state = hfo.getState() # s_2 print(state)

hfo.act(DASH, 4, 0) # a_2 hfo.step() state = hfo.getState() # s_3 print(state)

hfo.act(DASH, 4, 0) # a_4 hfo.step() state = hfo.getState() # s_5 print(state)

The outputs for 3 games are as follows: GAME 1:

[-0.00783491 0.304353 0. -0.60117686 0.309754 -1. -0.07788175 -0.12915528 -0.717829 -2. -1. 1. ] [-0.00783491 0.304353 0. -0.60117686 0.309754 -1. -0.07788175 -0.12915528 -0.717829 -2. 1. 1. ] [-0.00706983 0.30430746 0. -0.60117686 0.309754 -1. -0.0786112 -0.12924314 -0.7176385 -2. 1. 1. ] [-0.00602221 0.30428338 0. -0.60117686 0.309754 -1. -0.07959193 -0.12937814 -0.71738756 -2. 1. 1. ]

GAME 2:

[-0.00783491 0.304353 0. -0.60117686 0.309754 -1. -0.07788175 -0.12915528 -0.717829 -2. -1. 1. ] [-0.00783491 0.304353 0. -0.60117686 0.309754 -1. -0.07788175 -0.12915528 -0.717829 -2. 1. 1. ] [-0.00704449 0.30435562 0. -0.60117686 0.309754 -1. -0.07861197 -0.12926483 -0.7176455 -2. 1. 1. ] [-0.00590158 0.3042941 0. -0.60117686 0.309754 -1. -0.07969844 -0.12939876 -0.7173623 -2. 1. 1. ]

GAME 3:

[-0.00783491 0.304353 0. -0.60117686 0.309754 -1. -0.07788175 -0.12915528 -0.717829 -2. -1. 1. ] [-0.00706983 0.30436897 0. -0.60117686 0.309754 -1. -0.07858217 -0.12926644 -0.71765506 -2. 1. 1. ] [-0.00605714 0.30442786 0. -0.60117686 0.309754 -1. -0.07949132 -0.12942815 -0.71743464 -2. 1. 1. ] [-0.0049333 0.3045268 0. -0.60117686 0.309754 -1. -0.08048397 -0.12962067 -0.71719897 -2. 1. 1. ]

I also noted another problem. At the initial state s_1, I let the agent take action DASH(20,0), but the following state s_2 is the same as s_1 in GAME 1 and GAME 2.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LARG/HFO/issues/67, or mute the thread https://github.com/notifications/unsubscribe-auth/AABNOda8UWbxDJYmuQpbBjKbPAnda64Qks5uK6KpgaJpZM4Vky3R .

kxxwz commented 6 years ago

I think I can make hfo deterministic now. Just add several lines

'server::player_rand=0 ' \
'server::ball_rand=0 ' \
'server::kick_rand=0 ' \
'server::wind_rand=0' \

to serveOptions in ./bin/HFO https://github.com/LARG/HFO/blob/269c6b694e86ee5266c897f2727f2e0b7d5f10a0/bin/HFO#L68

mhauskn commented 5 years ago

This is awesome! I would welcome a pull request that added an HFO flag to switch on/off determinism.

On Tue, Jul 31, 2018 at 3:05 PM Hongjie notifications@github.com wrote:

I think I can make hfo deterministic now. Just add several lines

'server::player_rand=0 ' \ 'server::ball_rand=0 ' \ 'server::kick_rand=0 ' \ 'server::wind_rand=0' \

to serveOptions in ./bin/HFO https://github.com/LARG/HFO/blob/269c6b694e86ee5266c897f2727f2e0b7d5f10a0/bin/HFO#L68 .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LARG/HFO/issues/67#issuecomment-409383616, or mute the thread https://github.com/notifications/unsubscribe-auth/AABNOdUELQ0ZZPCLh3nUTfVrqvOgigjQks5uMNSrgaJpZM4Vky3R .