Hockey Score RL, define different learning methods and what exactly we want to learn

Guiliang / Sport-Analytic-NN

Neural Network Realization of Project Sport Analytic

3 stars 1 forks source link

Hockey Score RL, define different learning methods and what exactly we want to learn #2

Open Guiliang opened 7 years ago

Guiliang commented 7 years ago

Two points of view 1. Cut the game by "goal" event and train NN with different parts 2. Don't cut the game and train NN with different parts 3. Reward issue

Guiliang commented 7 years ago

1. Cut the game by "goal" event and train NN with different parts

Advantages: a) Easy to explain, the Value V(s) is the expectation of getting next reward b) The TD-gammon, as a two agent model, could be a cool baseline.

Disadvantage a) If we cut the game, it will become noncontinuous. It's not the original game anymore. b) Feature "Gametime" could be confusing.

Guiliang commented 7 years ago

2. Don't cut the game and train NN with different parts

Advantages: a) It's the original game and we can learn what the real game is. b) Gametime could be interesting, as times go, a team might have a higher expectation to get the next goal

Disadvantage a) Hard to define what have learn, the expectation of reward getting in a game? b) Learning becomes harder, as score got in a match is also defined by teams.

Guiliang commented 7 years ago

3. Reward issue

Don't Cut the game a) Away -1, Home 1 if home team scores, we set reward = 1, if away team scores, we set reward = -1. With this definition, the expected total reward (return) is the expected goal differential.

Cut the game a) Away 1, Home 1 The Expected Reward Function represent the probability that if play starts in state s, a random walk through the state space of unbounded length ends with a goal for Home team or the Away team.