Farama-Foundation / D4RL

A collection of reference environments for offline reinforcement learning
Apache License 2.0
1.33k stars 286 forks source link

Question about Maximum Score and Expert Dataset #151

Open liziniu opened 2 years ago

liziniu commented 2 years ago

Hi,

I want to know how the maximum score is obtained for MuJoCo tasks? From the wiki (https://github.com/rail-berkeley/d4rl/wiki/Dataset-Reproducibility-Guide#gym-mujocogym-bullet), it seems that we use the stochastic SAC policy to obtain the expert dataset. But, in rlkit, we evaluate the performance of SAC by its deterministic policy. Typically, if we use the stochastic policy to evaluate, the performance is not very good. Thus, I am not sure whether the reported maximum score is based on the deterministic policy or the stochastic policy.

If the reported score is based on the deterministic policy, should we consider the deterministic policy to collect the expert dataset?

Highly appreciate it if anyone can help.

AsadJeewa commented 2 years ago

Interested to hear from the team on this as well