Open jbirkesteen opened 1 year ago
@RasmusBrostroem Should we experiment with the network architecture in this issue, too? I'm thinking about just adding one hidden layer to get the interactive effects we talked about? If so, which combination of other hyperparameters in the table do you think we should try it out with?
@jbirkesteen I was thinking about the Lambda and gamma values also to see which effect they might have. We could do some small exploratory studies with different values, or look at how hyperparameter optimization is done in practice.
Also, I think the architecture is a good idea to think about also. Both in terms of size, but also in terms of techniques, such as creating a new agent that sees the board as a matrix, but then does some convolutions on the board to figure out patterns and such.
This might be too much though, so I would consider creating a new issue for the network architecture.
We should do some tests of the
TDAgent
soon and get an idea of what works and what doesn't. I suggest that we write some sort of report, e.g. as a jupyter notebook, where we run the appropriate training scripts and write the results. We could include/separately add a table like the one below with added columns for benchmarking results (and neptune id).(at * I suggest that we choose the one of 0.1 and 0.01 which worked best)
Also, how many games should we train for? Do you think 20k is enough?
Before this test, we need to solve the following issues: