ascentai / diy-gym

A framework for creating your own reinforcement learning environments using pybullet
MIT License
19 stars 3 forks source link

Use w/ stable_baselines #33

Open MartinaRuocco opened 3 years ago

MartinaRuocco commented 3 years ago

Hi @thomascent,

I've been trying to use your envs with a stable_baselines algo (here's the cleaned-up repository) but I had to do some adjustments in order to make them compatible :

  1. normalize and make symmetric action space
  2. flatten observation space and action space 3. sum the rewards
  3. compress the terminal signal 5. vectorize the environment also, I had to make a quick fix to the observation space boundaries because the reset() method would return an observation that is outside the observation space (?). These adjustments were detected via the use of the methods check_env() and set_env() from stable_baselines.

I used the example from the readme, and I tried to train a PPO2 model, with 6e5 training steps but unfortunately, this is the result (the values printed on the terminal are the reward). I believe that the training affects only one joint and not also the others and as a result, the arm stretches only. Any idea on how to approach this problem? Also, you mentioned that you tested your environments using other agents. Can you upload a functioning example (e.g. the TD3 that you mentioned) please?

MartinaRuocco commented 3 years ago

[UPDATE:] I tried to use the TD3 algorithm with the from_readme environment, here's my attempt. Unfortunately, it raises this error: MemoryError: Unable to allocate 119. GiB for an array with shape (100000, 160003) and data type float64 because the observation_space is too big.

So I tried with the ur_high_5 environment and the error this time is:

(...)
 File "/home/p16325mr/diy-gym/diy_gym/utils.py", line 95, in pop
    return self.arr[self.i - n:self.i]
IndexError: invalid index to scalar variable.

any help is very much appreciated :D