Suppose we attempt to train online in tandem with some networked PC - how do we collect and communicate experience (to a replay buffer style system, for example)? How do return gradients? (Do we rebuild the FPGA and re-flash or write to registers)? What is the right timing for that?
Suppose we attempt to train online in tandem with some networked PC - how do we collect and communicate experience (to a replay buffer style system, for example)? How do return gradients? (Do we rebuild the FPGA and re-flash or write to registers)? What is the right timing for that?
Likely some overlap with this issue and #12