amidos2006 / gym-pcgrl

A package for "Procedural Content Generation via Reinforcement Learning" OpenAI Gym interface.
MIT License
113 stars 27 forks source link

Fully conv policy #4

Closed smearle closed 4 years ago

smearle commented 4 years ago
amidos2006 commented 4 years ago

The inference shouldn't be in the same file as training :) I was going to change run.py to train.py and make another one called inference.py

amidos2006 commented 4 years ago

Also, out of curiosity why did you need to implement another ActorCritic and not just extend feedforwardpolicy and just return the cnn image from the wide flattened.

smearle commented 4 years ago

Very true, I can split it into those 2 separate files now. I extended ActorCritic to avoid the two dense/fully-connected layers in FeedForwardPolicy. This way, we could let an agent trained on a small map do inference (or continue training) on a bigger one. Also, those linear layers would get pretty expensive if they're receiving input and producing output the size of the entire map, as is the case in the FullyConv NN and the wide representation. It looks like binary_wide w/ 0.2 change and 48-length path is able to match performance of turtle. I was going to work on some inference code that could produce the kind of graphs you sent the group.

amidos2006 commented 4 years ago

I don't fully get the idea but I think you mean that the feedforwardpolicy will by default add the fully connected layer for the actor and critic by default :) about training on small map and testing on bigger that is not part of that project so need to think about it :) it is an interesting problem but we shouldn't concern ourselves with it in that project :)

smearle commented 4 years ago

Yes that's right, just skipping those linear layers. Perhaps there is a more elegant way to use stable_baselines to do this? I also just tend to dislike when part of the NN is hidden from view. I agree, we don't need to look at upscaling in this paper. Right now the benefit is just to skip unnecessary layers and save time in training the wide representation. Losing those layers does not seem to hurt performance.