problems about the D4rl dateset

daihuiao commented 1 year ago

I printed the reward information in the data set, but I got a particularly huge reward information, "Max return: 3780163.00, min: -6.61", and the trajectory length was 768445. Is there something wrong with the data set used？（env_name is walker2d-medium-expert-v2）

dataset = d4rl.qlearning_dataset(env)
rewards=[]
reward=0
for i in range(dataset["observations"].shape[0]):
    if not dataset["terminals"][i]:
        reward+=dataset["rewards"][i]
    elif dataset["terminals"][i]:
        reward+=dataset["rewards"][i]
        rewards.append(reward)
        reward=0
print(f'Max return: {np.max(rewards):.2f}, min: {np.min(rewards):.2f}')

Zhendong-Wang commented 1 year ago

Hi there, I think this is because that the D4RL datasets contain trajectories without 'timeout' limits, which is different from the default gym env (usually T=1000). This helps the agent to learn where are the true terminals.

For more details, please reach out D4RL repo.

daihuiao commented 1 year ago

thank you for your reply

Zhendong-Wang / Diffusion-Policies-for-Offline-RL

problems about the D4rl dateset #6