thank you for your code, does this code can be applied to other environments?like other atari games

biggzlar / i2a

Continuation of the credited project, based on Imagination Augmented Agents for Deep Reinforcement Learning (https://arxiv.org/abs/1707.06203). This version is implemented with A2C/PPO, as opposed to A3C.

3 stars 0 forks source link

thank you for your code, does this code can be applied to other environments?like other atari games #1

Open zhaoyingnan179346 opened 6 years ago

biggzlar commented 6 years ago

Hi there. Right now, this is not fully implemented but it should not be too hard to integrate.

You would need to build your own data generator, i.e. a small program, that generates screenshots of your atari game of choice and saves them in a format that can be understood by the rest of the code. If you have a look at the file gen_data.py you can see what the format should be. Essentially 3 input frames and 1 target frame per datapoint, saved in .npzformat. (This is what the environment model trains on.)
Now if you have trained the environment model on a lot of screenshots from your game of choice, you can run train.py to train the agent - simply switch out the parameter in line 49. Make sure to pick a valid openai gym environment.

I don't know if I will have the time to implement this myself, but maybe you would like to try and create a pull request. Hope this helps a little.

zhaoyingnan179346 commented 6 years ago

Thank you for your reply, I haven't read the code carefully. I have another two questions 1.How long does it take to train the I2A agent?(include the environment model and I2A agent) 2.Does I2A exceed A3C or PPO in Atari game Frostbite?

biggzlar commented 6 years ago

No problem. For your questions:

Depends heavily on the system. Pretraining the environment model should not take too long but training the agent can take 10+ hours until you see first results.
This still uses A3C, the difference is that instead of only inputting a single frame from the game, we also have a second track where we input encodings of "imagined" future trajectories. I only continued working on this implementation (i.e. did not build it myself) and in all honesty, it is not much better than the baseline (without the environment model) which is likely an issue of the code - not the general idea. In most cases, model-based approaches do excel in environments like frostbite, which requires some planning. I think this might be an issue with how the environment model is trained, so there is probably some more work to do there.

zhaoyingnan179346 commented 6 years ago

Thanks, I think so. I just read your gen_data.py and generator.py, which make me feel confused. The codes are complicated.it seems that you generate the game frame manually instead of using the gym simulator(for example, using the code like "frame=env.step()" to generate frame ). Is that true?

biggzlar commented 6 years ago

Yes, you are correct. As mentioned, I only added to this implementation - I did not build it from the ground up. It would have been much easier (and more flexible) to build the generator using gym. That is definitely a required feature.

zhaoyingnan179346 commented 6 years ago

Thanks, it helps a lot. Hoping to contact with you again!

biggzlar commented 6 years ago

Anytime and thanks for your interest! I would be glad to see this completed. You too!