Closed andreasostling closed 10 months ago
I'm not sure I'm following completely, is this what you meant?
The concept return mean two things here. First, the return in RL is the aggregation of the reward signals (i.e. we want to maximize the return). Second, we use "return" for the output from the function. Does this make more sense?
Ahhh, I see. Okay yeah that makes sense. Let's just leave it as it was then.
The return is the aggregated reward. So we should not change return to reward, but maybe change that the function should output xyz instead of returning it?