Farama-Foundation / D4RL-Evaluations

Apache License 2.0
187 stars 27 forks source link

Clarification about Training and Evaluation Task Split #9

Closed rasoolfa closed 4 years ago

rasoolfa commented 4 years ago

Hi,

Thanks for sharing this repository. It is great I'd like to ask about "Training and Evaluation Task Split" in Appendix D and how results are reported in Tables 1 and 3. I am a bit confused how those have been done. For simplicity, let's assume BCQ and Maze2D are being used, which of the followings is correct description of what have been done in this paper:

  1. BCQ is trained on "maze2d-umaze-v1". Then the leaned model is used to report results on "maze2d-eval-umaze-v1"? In other words, maze2d-eval-umaze-v1 is not used for training and only used to report results?

  2. BCQ's hyperparameters are tuned on "maze2d-umaze-v1". Then, BCQ is trained with those hyperparameters and evaluated on "maze2d-eval-umaze-v1"? In other words, maze2d-eval-umaze-v1 is used for both training and evaluation?

  3. Or any other scenario?

Thanks for your help.

rasoolfa commented 4 years ago

I asked it in d4rl repo which I believe more relevant.