As the paper noted, the experimental results is averaged over six random seeds. Can I ask how many eval_episodes were used for each methods (DV2, CQL,et al) in the evaluation phase, as I found the visual input settings(V-D4RL) are more unstable compared to proprioceptive states (D4RL).
As the paper noted, the experimental results is averaged over six random seeds. Can I ask how many eval_episodes were used for each methods (DV2, CQL,et al) in the evaluation phase, as I found the visual input settings(V-D4RL) are more unstable compared to proprioceptive states (D4RL).