Hello, after reviewing your project, I have learned a lot, thanks for your great help!!
I have found some detail issues in the cartpole_learn.py for GPREPS.
I've noticed that after each systemRollout, the new X and Y are concatenated together, but the current and future states before and after the concatenation do not correspond. This could lead to incorrect system dynamics being fitted. In fact, after skipping these discontinuous indices and fitting the system dynamics, the program learns much faster. Originally, it required 7 fittings of the GP model, but now it only requires 3 fittings of the GP model.
Here are some of my modifications.
I use a interval_list to record the length of each continuous rollout data.
x, y = systemRollout(env, hpol, pol)
interval_list.append(x.shape[0])
And only fit these continuous data.
def fit_continuous(self, X, Y, interval_list):
Xt = X[:, self.dyni]
Yt = Y[:, self.difi]-X[:,self.difi]
Xt_continuous = np.delete(Xt, np.array(interval_list)-1, axis=0)
Yt_continuous = np.delete(Yt, np.array(interval_list)-1, axis=0)
for i in range(self.nout):
try:
self.gps[i].fit(Xt_continuous, Yt_continuous[:, i])
except ValueError as e:
print( 'ValueError cought for i:{0}: e:{1}'.format( i, e ) )
raise e
Hello, after reviewing your project, I have learned a lot, thanks for your great help!!
I have found some detail issues in the cartpole_learn.py for GPREPS.
I've noticed that after each systemRollout, the new X and Y are concatenated together, but the current and future states before and after the concatenation do not correspond. This could lead to incorrect system dynamics being fitted. In fact, after skipping these discontinuous indices and fitting the system dynamics, the program learns much faster. Originally, it required 7 fittings of the GP model, but now it only requires 3 fittings of the GP model.
Here are some of my modifications. I use a
interval_list
to record the length of each continuous rollout data.And only fit these continuous data.