Summary [cluster-multiagent-shared-v4]

Seems that MultiagentSharedEnv, even without collisions is quite difficult of an environment.

Test-time rollout for k=6: video (Summary: in 1000 steps, 5 of 6 agents reach the target, although they get close within a few steps.)

cathywu commented 7 years ago

Summary [cluster-multiagent-shared-v6]

MultiagentSharedEnv, done_epsilon=0.01, batch_size=10000, k=50 (WHY does (exponential) discount factor of 0.5 cause such regression?): 2017-05-16-multiagentsharedenv-spatialdiscount-doneeps0 01-batch10000-k50

Next I will try binary and linear discounting instead of exponential.

cathywu commented 7 years ago

Summary [cluster-multiagent-shared-v7]

Results:

Clear separation of training convergence for different binary spatial discount factors.
Results are similar with linear spatial discounting.

Pending:

[ ] Waiting for training to end to generate rollout videos.
[ ] Generating a control experiment with gamma=100 and extreme discounting for comparison with gamma=0.01, 0.05. Would like small discount to be better than both of these.

MultiagentSharedEnv, done_epsilon=0.01, batch_size=10000, collision_penalty=50, k=25 (first positive result on spatial discounting!): 2017-05-16-multiagentsharedenv-binaryspatialdiscount-donesps0 01-batch10000-k25

MultiagentSharedEnv, done_epsilon=0.01, batch_size=10000, collision_penalty=50, k=50: 2017-05-16-multiagentsharedenv-binaryspatialdiscount-donesps0 05-batch10000-k25

cathywu commented 7 years ago

Summary [cluster-multiagent-shared-v9]

Note: local run, 1 seed

Results:

Environment with assymetric rewards does not seem to make a difference. (Local optimization still does just as well as small spatial discounting.)

2017-06-15-multiagentsharedenv-assymetricreward

cathywu / rllab

Spatial discounting #14

Summary [cluster-multiagent-shared-v4]

Summary [cluster-multiagent-shared-v6]

Summary [cluster-multiagent-shared-v7]

Summary [cluster-multiagent-shared-v9]