boardlaw's grown like a ball of mud these past few months, and there are likely complexities that aren't adding much and could be thrown away.
League: this is the top of the list. It's a lot of complexity and I haven't actually seen any conclusive evidence of cyclic behaviour (but maybe that's because I've been using the league?)
LR warmup: right now the LR gets warmed up from zero over the first ~hundred steps to it's (very high) 1e-2. Is this necessary? Is the 1e-2 actually any better than 3e-4?
Replay buffer: the biggest deviation of this implementation from classic AZ is the lack of a replay buffer. I did this based on OA5's appendix on stale samples. Was this a good choice? Can emulate a replay buffer easily enough by upping the buffer size, ties into hyperparam tuning.
Did this ad-hoc over the past few weeks. The league and LR warmup both fell away, and the replay buffer hasn't proved itself necessary. Might need to resurrect this if reviewers complain, a la #5 .
boardlaw's grown like a ball of mud these past few months, and there are likely complexities that aren't adding much and could be thrown away.