Monte Carlo vs TD - Githubissues

"The next most obvious advantage of TD methods over Monte Carlo methods is that they are naturally implemented in an online, fully incremental fashion. With Monte Carlo methods one must wait until the end of an episode, because only then is the return known, whereas with TD methods one need wait only one time step."

But in our application, we don't need to wait until the end of episode. Each action has a return.

Maybe, it is possible to apply Monte Carlo for this problem

Bahador-Bakhshi / 5G-Federation

Monte Carlo vs TD #13