A quick question. When I'm using the code to do some preliminary RL experiment, I found these in the main.py.
# FIXME: works only for environments with sparse rewards
for idx, eps_done in enumerate(done):
if eps_done:
episode_rewards.append(reward[idx])
Does this only used to collect episode reward to show statistics and have nothing to do with the training part?
Does the FIXME means it intends to collect the accumulated reward of the whole episode just ended but currently it assume the reward at last step is the accumulated reward?
A quick question. When I'm using the code to do some preliminary RL experiment, I found these in the main.py.
Does this only used to collect episode reward to show statistics and have nothing to do with the training part?
Does the FIXME means it intends to collect the accumulated reward of the whole episode just ended but currently it assume the reward at last step is the accumulated reward?
Thanks!