Open vwxyzjn opened 2 years ago
The bottleneck I think is still largely on the NN side. So one thing worth trying is to reduce the NN size.
Alternatively, I noticed the learning rate annealing, in the end, seems to really help the algorithm converge. So maybe we could also try using a smaller learning rate and just turn off annealing.
Maybe we could tune with the discount factor (we should also visualize the discounted returns (what the agent actually optimized for).
Training an agent now still takes a long time. The particular experiment in #36 took 4d 9h 11m 14s to finish.
Looking at the reward chart, it appears the agent could achieve 70% of the final performance in just 50M steps (or about 10 hours into training)
We should try to optimize based on the 10 hours time computational budget.