Replicable-MARL / MARLlib

One repository is all that is necessary for Multi-agent Reinforcement Learning (MARL)
https://marllib.readthedocs.io
MIT License
935 stars 151 forks source link

Confusing results in simple spread environment #236

Open Destiny000621 opened 6 months ago

Destiny000621 commented 6 months ago

I ran the COMA, HATRPO, and MAPPO algorithms in the Simple Spread environment for 500,000 timesteps. None of them achieved a reward higher than -100. However, in the results folder, most rewards are in the range of -30 to -40. After training, the reward is even lower than the one at the start. The model parameters I used are the same as the ones in the results folder.

from marllib import marl

# prepare env
env = marl.make_env(environment_name="mpe", map_name="simple_spread", force_coop=True)

# initialize algorithm with appointed hyper-parameters
coma = marl.algos.coma(hyperparam_source='mpe')

# build agent model based on env + algorithms + user preference
model = marl.build_model(env, coma, {"core_arch": "gru", "encode_layer": "128-256"})

# start training
coma.fit(env, model, stop={'timesteps_total': 500000}, share_policy='group', checkpoint_freq=100000, checkpoint_end=True)
florin-pop commented 5 months ago

Are you plotting the episode_reward_mean or episode_reward_max? I suspect that the "reward" in the results csv is the ray/tune/episode_reward_max, but I may be wrong.