Use w/ stable_baselines

Hi @thomascent,

I've been trying to use your envs with a stable_baselines algo (here's the cleaned-up repository) but I had to do some adjustments in order to make them compatible :

normalize and make symmetric action space
flatten observation space and action space 3. sum the rewards
compress the terminal signal 5. vectorize the environment also, I had to make a quick fix to the observation space boundaries because the reset() method would return an observation that is outside the observation space (?). These adjustments were detected via the use of the methods check_env() and set_env() from stable_baselines.

I used the example from the readme, and I tried to train a PPO2 model, with 6e5 training steps but unfortunately, this is the result (the values printed on the terminal are the reward). I believe that the training affects only one joint and not also the others and as a result, the arm stretches only. Any idea on how to approach this problem? Also, you mentioned that you tested your environments using other agents. Can you upload a functioning example (e.g. the TD3 that you mentioned) please?

ascentai / diy-gym

Use w/ stable_baselines #33