RicardoDominguez / PyCREPS

Contextual Relative Entropy Policy Search for Reinforcement Learning in Python
15 stars 1 forks source link

some suggestions for improvements on the GPREPS case studies #18

Open Seas00n opened 1 month ago

Seas00n commented 1 month ago

Hello, after reviewing your project, I have learned a lot, thanks for your great help!!

I have found some detail issues in the cartpole_learn.py for GPREPS.

I've noticed that after each systemRollout, the new X and Y are concatenated together, but the current and future states before and after the concatenation do not correspond. This could lead to incorrect system dynamics being fitted. In fact, after skipping these discontinuous indices and fitting the system dynamics, the program learns much faster. Originally, it required 7 fittings of the GP model, but now it only requires 3 fittings of the GP model.

Here are some of my modifications. I use a interval_list to record the length of each continuous rollout data.

x, y = systemRollout(env, hpol, pol)
interval_list.append(x.shape[0])

And only fit these continuous data.

def fit_continuous(self, X, Y, interval_list):
        Xt = X[:, self.dyni]
        Yt = Y[:, self.difi]-X[:,self.difi]
        Xt_continuous = np.delete(Xt, np.array(interval_list)-1, axis=0)
        Yt_continuous = np.delete(Yt, np.array(interval_list)-1, axis=0)
        for i in range(self.nout):
            try:
                self.gps[i].fit(Xt_continuous, Yt_continuous[:, i])
            except ValueError as e:
                print( 'ValueError cought for i:{0}: e:{1}'.format( i, e ) )
                raise e