Open Guiliang opened 7 years ago
Advantages: a) Easy to explain, the Value V(s) is the expectation of getting next reward b) The TD-gammon, as a two agent model, could be a cool baseline.
Disadvantage a) If we cut the game, it will become noncontinuous. It's not the original game anymore. b) Feature "Gametime" could be confusing.
Advantages: a) It's the original game and we can learn what the real game is. b) Gametime could be interesting, as times go, a team might have a higher expectation to get the next goal
Disadvantage a) Hard to define what have learn, the expectation of reward getting in a game? b) Learning becomes harder, as score got in a match is also defined by teams.
Don't Cut the game a) Away -1, Home 1 if home team scores, we set reward = 1, if away team scores, we set reward = -1. With this definition, the expected total reward (return) is the expected goal differential.
Cut the game a) Away 1, Home 1 The Expected Reward Function represent the probability that if play starts in state s, a random walk through the state space of unbounded length ends with a goal for Home team or the Away team.
Two points of view 1. Cut the game by "goal" event and train NN with different parts 2. Don't cut the game and train NN with different parts 3. Reward issue