RasmusBrostroem / ConnectFourRL

0 stars 0 forks source link

Test training strategies for `TDAgent` and report results #103

Open jbirkesteen opened 1 year ago

jbirkesteen commented 1 year ago

We should do some tests of the TDAgent soon and get an idea of what works and what doesn't. I suggest that we write some sort of report, e.g. as a jupyter notebook, where we run the appropriate training scripts and write the results. We could include/separately add a table like the one below with added columns for benchmarking results (and neptune id).

These are the hyperparameters and elements of the environment that I suggest we try out to begin with. Any thoughts? epsilon alpha Reward system Opponent
0 0.1 W=1, L=0 self (freeze each episode)
0 0.01 W=1, L=0 self (freeze each episode)
0.1 0.01* W=1, L=0 self (freeze each episode)
0.1 0.01* W=1, L=-1 self (freeze each episode)
0.1 0.01* W=1, L=-1 Minimax
0.1 0.01* W=1, L=-1 2 simultaneously training

(at * I suggest that we choose the one of 0.1 and 0.01 which worked best)

Also, how many games should we train for? Do you think 20k is enough?

Before this test, we need to solve the following issues:

jbirkesteen commented 1 year ago

@RasmusBrostroem Should we experiment with the network architecture in this issue, too? I'm thinking about just adding one hidden layer to get the interactive effects we talked about? If so, which combination of other hyperparameters in the table do you think we should try it out with?

RasmusBrostroem commented 1 year ago

@jbirkesteen I was thinking about the Lambda and gamma values also to see which effect they might have. We could do some small exploratory studies with different values, or look at how hyperparameter optimization is done in practice.

Also, I think the architecture is a good idea to think about also. Both in terms of size, but also in terms of techniques, such as creating a new agent that sees the board as a matrix, but then does some convolutions on the board to figure out patterns and such.

This might be too much though, so I would consider creating a new issue for the network architecture.