Closed benelot closed 7 years ago
This is training using OpenAI baselines DQN, it saves the data in the current folder by default. Also, it looks like its pkl file are not fully portable (cross-platform, different python versions/32bit-64bit).
I prefer TF and Gym and lightweight RL algorithms such as DQN, DDPG, PPO etc using TF and gym. OpenAI Baselines fits that bill. I haven't looked at KerasRL yet, but not too eager to add a lot of complexity and duplication, but it might be interesting.
I wonder how KerasRL and OpenAL Baselines DQN etc compare?
As soon as OpenAI baselines has more than just DQN, it is definitely going to be a great library, especially because they try to tune them for good, general performance. But for now, it is a bit limited, especially for harder environments, DQN is going to be hard to train as it is not continuous valued. At least having DDPG would be especially helpful.
The great thing about KerasRL is that it reduces clutter when programming as it is based on the high-level API of tensorflow called Keras. KerasRL has DQN, DDQN, DDPG, CDQN or NAF, CEM, etc and uses the fully portable data container hdf5 for network weight storage. That is why I started using it.
I implemented a trainer which allows the agent/environment combination to be switched, so we can compare later with OpenAI baselines. I am going to pull-request a simple example so that you can look at what I have in mind.
I will work on my trainer class now and add it to my next pull-request.
I saw that there is a structure for the gym stuff that I will follow for the other gym implementations. train* for the training and enjoy* for the testing. I will add an optional param for continuing training from the latest saved weight-set and make the enjoy load the latest saved weight-set. The current gyms do not save to the data folder, is that intended? Also there are no saves except one without name. Any suggestions for the structure there? Otherwise I will make the saves in data/ using KerasRL with the following scheme-.whatever. In my experiments I trained agents with different underlying algorithms, I will think about a structure for that.