cathywu / rllab

rllab is a framework for developing and evaluating reinforcement learning algorithms, fully compatible with OpenAI Gym.
Other
1 stars 0 forks source link

Simple envs for sanity checking #9

Open cathywu opened 7 years ago

cathywu commented 7 years ago

Experiment snapshot from commit 43f7576.

python3 examples/cluster_multiagent_point_comparison.py
cathywu commented 7 years ago

Preliminary results [exp cluster-multiagent-v10]

Summary: the action-dependent baseline does not improve/regress on training (the training curves look identical across all experiments so far), even though in the case of NoStateEnv with k=6, its explained variance is much larger.

NoStateEnv, with k=6, average return: 2017-04-26-nostateenv-k6

NoStateEnv, with k=50, average return: 2017-04-26-nostateenv-k50

MultiactionPointEnv, with k=6, average return: 2017-04-26-multiactionpointenv-k6

NoStateEnv, with k=6, explained variance: 2017-04-26-nostateenv-k6-explainedvariance Furthermore, as a bonus, this somewhat confirms that I didn't accidentally just run with the same (non action dependent baseline) for all the experiments.

cathywu commented 7 years ago

New env: OneStepNoStateEnv (commit efb7e17)

OneStepNoStateEnv, with k=6, average return (the two curves are exactly overlapping): 2017-04-26-onestepnostateenv-k6

OneStepNoStateEnv, with k=6, explained variance (huhhh?): 2017-04-26-onestepnostateenv-k6-explainedvariance

cathywu commented 7 years ago

Possible reasons action-dependent (AD) baseline not effective

cathywu commented 7 years ago

OneStepNoStateEnv, with k=6, batch size=100, average return (the two curves are exactly overlapping): 2017-04-26-onestepnostateenv-k6-batch100

OneStepNoStateEnv, with k=6, batch size=100, explained variance (huhhhhhh): 2017-04-26-onestepnostateenv-k6-batch100-explainedvariance

The high explained variance (values of 1) of the GaussianMLPBaseline results from low variance among the return values, after a good policy has been learned, e.g.:

>> np.var(returns)
8.584473133996777e-09

The baseline is then overfitting to the only value it sees.

cathywu commented 7 years ago

Summary: ZeroBaseline (with whitening) does better! [exp cluster-multiagent-v11]

OneStepNoStateEnv, with k=6: 2017-04-28-zerobaselinerocksonestepenv

OneStepNoStateEnv, with k=50: 2017-04-28-zerobaselinerocksonestepenv-k50

OneStepNoStateEnv, with k=200: 2017-04-28-zerobaselinerocksonestepenv-k200

Hypothesis: poor fits of the NN baselines (because the linear feature baselines below seem to appear to match the performance of the whitened ZeroBaseline here).

Recommendation: Ignore the NN baseline results in these runs.

From Rocky: It shouldn't be a fitting problem. The center adv option should undo everything the baseline does, since it recenters the advantage estimates.

OneStepNoStateEnv, with k=200, holdout loss for the NN baseline (vf): 2017-04-29-onestepnostateenv-k200-holdoutloss-vf

OneStepNoStateEnv, with k=200, holdout loss for the NN baseline (vf0): 2017-04-29-onestepnostateenv-k200-holdoutloss-vf0

cathywu commented 7 years ago

Summary [exp cluster-multiagent-v12]

I'm now running experiments with high dimensions to see a more pronounced effect (k=1000, 2000). EDIT (2017-05-02): Added figures for k=500, 1000, 2000.

OneStepNoStateEnv, with k=6, no whitening (center_adv=False): 2017-04-29-onestepnostateenv-nowhitening-k6 OneStepNoStateEnv, with k=50, no whitening (center_adv=False): 2017-04-29-onestepnostateenv-nowhitening-k50 OneStepNoStateEnv, with k=200, no whitening (center_adv=False) (First positive result!): 2017-04-29-onestepnostateenv-nowhitening-k200 OneStepNoStateEnv, with k=500, no whitening (center_adv=False) (Positive result!): 2017-05-02-onestepnostateenv-nowhitening-k500 OneStepNoStateEnv, with k=1000, no whitening (center_adv=False) (Less conclusive.): 2017-05-19-onestepnostateenv-nowhitening-k1000 OneStepNoStateEnv, with k=2000, no whitening (center_adv=False) (Hitting scaling issues.): 2017-05-02-onestepnostateenv-nowhitening-k2000

cathywu commented 7 years ago

Summary [exp cluster-multiagent-v13]


MultiactionPointEnv, k=6, no whitening, done=reach origin: 2017-05-02-multiactionpointenv-doneisreachorigin-k6 MultiactionPointEnv, k=1000, no whitening, done=reach origin: 2017-05-02-multiactionpointenv-doneisreachorigin-k1000


MultiagentPointEnv, k=6, no whitening, done=reach origin: 2017-05-02-multiagentpointenv-doneisreachorigin-k6 MultiagentPointEnv, k=50, no whitening, done=reach origin: 2017-05-02-multiagentpointenv-doneisreachorigin-k50 MultiagentPointEnv, k=200, no whitening, done=reach origin: 2017-05-02-multiagentpointenv-doneisreachorigin-k200 MultiagentPointEnv, k=500, no whitening, done=reach origin: 2017-05-02-multiagentpointenv-doneisreachorigin-k500


MultiagentPointEnv, k=200, no whitening, done=reach origin, batch size=100: 2017-05-02-multiagentpointenv-doneisreachorigin-k200-batch_size100 MultiagentPointEnv, k=200, no whitening, done=reach origin, batch size=500: 2017-05-02-multiagentpointenv-doneisreachorigin-k200-batch_size500 MultiagentPointEnv, k=200, no whitening, done=reach origin, batch size=1000: 2017-05-02-multiagentpointenv-doneisreachorigin-k200-batch_size1000