Hi, I am wondering if there are training step limits for comparing offline RL algorithms based on d4rl dataset? Many papers mentioned that they used 1e6 gradient step for training, is it required to use 1e6 steps for legit comparison? Or can I let the agent learn for more than that number of steps?
Hi, I am wondering if there are training step limits for comparing offline RL algorithms based on d4rl dataset? Many papers mentioned that they used 1e6 gradient step for training, is it required to use 1e6 steps for legit comparison? Or can I let the agent learn for more than that number of steps?