Spatial discounting - Githubissues

cathywu / rllab

rllab is a framework for developing and evaluating reinforcement learning algorithms, fully compatible with OpenAI Gym.

Other

1 stars 0 forks source link

Spatial discounting #13

Open cathywu opened 7 years ago

cathywu commented 7 years ago

Major restructuring for supporting shared policy (currently called spatial_discounting, will be refactored in next commit)
Updated run script for spatial discounting
MultiagentPointEnv: re-written for shared policy
MultiagentPointEnv: re-written with local observation space resembling LIDAR (d=2 only).

cathywu commented 7 years ago

Testing done:

python3 examples/cluster_multiagent_point_comparison.py

Here's some small indication that the tensor restructuring and reshaping hasn't totally screwed things up: learning is still happening. 2017-05-08-multiagentsharedenv-perhapsimplementationisok

cathywu commented 7 years ago

Summary: cluster-multiagent-shared-v1 (commit cf52512)

Adds support for shared policy (see TRPOShared, NPOShared, SharedGaussianMLPPolicy, and .shared_policy attribute).
Implements spatial discounting (see rllab/misc/special.py:spatial_discount())
Updates run script for spatial discounting
MultiagentPointEnv: re-written for shared policy
MultiagentPointEnv: re-written with local observation space resembling LIDAR (d=2 only).
Currently running exp cluster-multiagent-shared-v1 (see cluster_multiagent_point_comparison.py): spatial discounting (with spatial discount rates [0.5, 0.7, 0.8, 0.97, 0.99, 0.995, 1], various numbers of agents, batch sizes, etc.)

cathywu commented 7 years ago

cluster-multiagent-shared-v2 (commit a7e2bce)

Example visualization of LIDAR view from agent=0, initial state from 2 concurrently sampled environments with 50 agents, 10 angular slices of the space. Red is agent0, blue are the other agents, purple crosses are the lidar measurements projected back into cartesian coordinates: lidar-reset--3335646985623688744-agent0 lidar-reset-592221439877259022-agent0

cathywu commented 7 years ago

Update: cluster-multiagent-shared-v2 (commit a7e2bce)

After clipping actions, movements seem more reasonable between Steps: lidar-reset--3597117061730238138-agent0 lidar-7851020778695200524-agent0