YeWR / EfficientZero

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.
GNU General Public License v3.0
847 stars 131 forks source link

Question about getting zero test score when I try to run EfficientZero on BabyAI grid environment #37

Open jiachengc opened 1 year ago

jiachengc commented 1 year ago

Hello, first of all thanks for your amazing job on EfficientZero.

I tried to adapt EfficientZero on BabyAI environment like: "PutNextLocal", but it just keep give me 0 test score during the 100k step training process.

I made several modifications in order to adapt to BabyAI "PutNextLocal" env:

  1. I create dir for env at config/babyai, and implement BabyAIConfig(BaseConfig). I leave every parameters as default just like Atari, and only change line 101 from (image_channel,96,96) to (image_channel,7,7) in file config/babyai/__init__.py.
  2. Change class name from AtariWrapper(Game) to BabyAIWrapper(Game), and leave everything else as default setting.
  3. Comment out from line 103 to line 111 since grid game does not have ale.
  4. Also comment out line from 235 to 237 https://github.com/YeWR/EfficientZero/blob/main/core/utils.py#L235 in core/utils.py
  5. Also, modify my bash file like, Screen Shot 2022-11-07 at 1 11 20 PM

After running the programing with default parameter setting like atari, the tensorboard like: Screen Shot 2022-11-07 at 1 16 14 PM Screen Shot 2022-11-07 at 1 16 31 PM Screen Shot 2022-11-07 at 1 16 44 PM Do you have any suggestions about how to make a correct modification and make the program produce reasonable result on babyai 'PutNextLocal'?

Thank you so much and looking forward to hear from you.

YeWR commented 1 year ago

Hi, I noticed that you have changed "(image_channel,96,96) to (image_channel,7,7)". Is the observation shape of your environment 7x7? Noticed that there exists a downsampling network in model.py, which tries to down-sample the atari observations from 96x96 to 6x6.

Since the observation shape of your env is 7x7, I think you have to re-design the models, especially for the representation model. For example, remove the down-sample model and use some convnets here. And reducing the channels of the model can be better since the observation shape is much smaller.

jiachengc commented 1 year ago

Thanks for your reply, Have you have any idea about why it set the output dimension of value prediction network as 601? Screen Shot 2022-11-30 at 2 46 28 PM It there any specific meaning behind this number? because I cannot find a reason in paper.