Mehooz / BIRD_code

Code for paper "Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning".
14 stars 2 forks source link

Keep getting Invalid one hot action error #1

Closed nickuncaged1201 closed 3 years ago

nickuncaged1201 commented 3 years ago

I tried a fresh environment with the requirements specified. Since I'm on windows10, dm-control has some issue with installing so I go with open-AI gym and the provided atari options. But it seems like I could never get past the value error "Invalid one-hot action", which is raised from the file wrapper.py and on like from 359 to 362. Could you explain why we need a one-hot encoded action to be np.allclose to the original action? I couldn't quite figure out the purpose.

I tried just comment out this section and let the code run, but just after training for around 1 million steps, I notice the error isn't quite right, as the image loss and model loss are basically unchanged throughout the training process, all entropy loss term are showing infinity. So I'm just assuming this is because the action part, but I couldn't quite figure it out.

Mehooz commented 3 years ago

Hi, thanks for noticing. Since I didn't perform our comparison on Atari, I may have no experience launching it on Atari. The Atari code is here since we adapted our code from dreamer's repo. In my opinion, np.allclose is just to make sure that it's indeed a one-hot vector to use safely. It's worth noting that dreamer and bird both didn't focus on discrete envs, however, dreamerv2 did so. Maybe you can directly try their code to perform on Atari to get a more reasonable baseline. Besides, I think for Atair, 1M is far from enough to see anything on performance.