Closed CeHao1 closed 1 year ago
Hi, Ce Hao, I have encountered the same problem as you, and I have not found the definition of success rate. Have you solved this problem?
Thank you very much.
Hey, sorry I missed this! In the maze the agent receives a reward only when it is close to the target, see reward definition here: https://github.com/kpertsch/d4rl/blob/master/d4rl/pointmaze/maze_model.py#L135 Note that currently the episode does not terminate when the agent reaches the goal, so an agent is incentivized to reach there quickly since it can collect rewards for the remainder of the episode by staying close to the goal.
The success rate in the paper is defined as the agent reaching the goal at some point in the episode, i.e. getting an episode reward >= 1.
I hope this answers your questions!
Is the setting of kitchen env the same? "Reward" directly represents "Complete subgoals"?
Thank you very much!
Yes, that's correct!
Thank you Kpertsch. I understand the definition and agree it is reasonable.
But when I try to reproduce the results of SPiRL open-loop in Maze navigation (Figure 4 in the SPiRL paper), the policy might not converge when random seed = 0 and 3. Although the agent can frequently reach the target, the success rate could not be near 100% within 1.5M steps. The actual success rate is less than 10%. However, in some random seeds, the success rate could be larger than 90%. So the performance is very sensitive to the random seeds.
I also test the experiment in Figure 13, where we choose a simpler target and the agent with all 5 seeds can make the policy converge and always reach the target.
Thanks for the explanation and I plan to also enable the fine-tuning of the low-level decoder to enable more skill adaptation and exploration. Thanks a lot!
Hi, I am Ce Hao and I am reproducing your code for SPiRL paper.
In Figure 4 of the paper, the success rate of Maze Navigation reached almost 1 after 1 M steps.
However, in the wandb logger, there is no variable called 'success rate', so I presume this 'success rate' is an indirect variable. The definition is, at each epoch(50 episodes), if at least one reward > 1, which means the agent at least reaches the target once; then we think it is successful. And we calculate the mean and standard deviation of the success rate over 3 seeds.
However, the real experiments are different. Also as you show in Figure 5, SPiRL (Ours), the agent is still exploring many other places, but not converging to the path directly to the goal. My reproduction also shows that only less than 20% of trajectories finally reach the target.
I want to develop new algorithm on the SPiRL baseline, so could you please help us explain the definition of the success rate of Maze Navigation? Thanks!
Best, Ce Hao