Open cathywu opened 7 years ago
Summary: the action-dependent baseline does not improve/regress on training (the training curves look identical across all experiments so far), even though in the case of NoStateEnv with k=6, its explained variance is much larger.
NoStateEnv, with k=6, average return:
NoStateEnv, with k=50, average return:
MultiactionPointEnv, with k=6, average return:
NoStateEnv, with k=6, explained variance: Furthermore, as a bonus, this somewhat confirms that I didn't accidentally just run with the same (non action dependent baseline) for all the experiments.
New env: OneStepNoStateEnv (commit efb7e17)
OneStepNoStateEnv, with k=6, average return (the two curves are exactly overlapping):
OneStepNoStateEnv, with k=6, explained variance (huhhh?):
OneStepNoStateEnv, with k=6, batch size=100, average return (the two curves are exactly overlapping):
OneStepNoStateEnv, with k=6, batch size=100, explained variance (huhhhhhh):
The high explained variance (values of 1) of the GaussianMLPBaseline results from low variance among the return values, after a good policy has been learned, e.g.:
>> np.var(returns)
8.584473133996777e-09
The baseline is then overfitting to the only value it sees.
OneStepNoStateEnv, with k=6:
OneStepNoStateEnv, with k=50:
OneStepNoStateEnv, with k=200:
Hypothesis: poor fits of the NN baselines (because the linear feature baselines below seem to appear to match the performance of the whitened ZeroBaseline here).
Recommendation: Ignore the NN baseline results in these runs.
From Rocky: It shouldn't be a fitting problem. The center adv option should undo everything the baseline does, since it recenters the advantage estimates.
OneStepNoStateEnv, with k=200, holdout loss for the NN baseline (vf):
OneStepNoStateEnv, with k=200, holdout loss for the NN baseline (vf0):
I'm now running experiments with high dimensions to see a more pronounced effect (k=1000, 2000
).
EDIT (2017-05-02): Added figures for k=500, 1000, 2000
.
OneStepNoStateEnv, with k=6, no whitening (center_adv=False
):
OneStepNoStateEnv, with k=50, no whitening (center_adv=False
):
OneStepNoStateEnv, with k=200, no whitening (center_adv=False
) (First positive result!):
OneStepNoStateEnv, with k=500, no whitening (center_adv=False
) (Positive result!):
OneStepNoStateEnv, with k=1000, no whitening (center_adv=False
) (Less conclusive.):
OneStepNoStateEnv, with k=2000, no whitening (center_adv=False
) (Hitting scaling issues.):
MultiactionPointEnv, k=6, no whitening, done=reach origin: MultiactionPointEnv, k=1000, no whitening, done=reach origin:
MultiagentPointEnv, k=6, no whitening, done=reach origin: MultiagentPointEnv, k=50, no whitening, done=reach origin: MultiagentPointEnv, k=200, no whitening, done=reach origin: MultiagentPointEnv, k=500, no whitening, done=reach origin:
MultiagentPointEnv, k=200, no whitening, done=reach origin, batch size=100: MultiagentPointEnv, k=200, no whitening, done=reach origin, batch size=500: MultiagentPointEnv, k=200, no whitening, done=reach origin, batch size=1000:
Experiment snapshot from commit 43f7576.