Replicable-MARL / MARLlib

One repository is all that is necessary for Multi-agent Reinforcement Learning (MARL)
https://marllib.readthedocs.io
MIT License
940 stars 153 forks source link

Rewards is always negative. But never becomes positive #143

Closed A0ce closed 1 year ago

A0ce commented 1 year ago

First of all, thank you for the project. I'm just a little bit testing with it, I noticed if I extend the example from you a little bit and use this Pommerman environment with e.g. the algorithms QMIX and then change Pommerman to Teammatch, that the rewards never go above 0 no matter how many episodes I train.

    env = marl.make_env(environment_name='pommerman', map_name='PommeTeamCompetition-v0')
    vdn = marl.algos.qmix(hyperparam_source="test")
    model = marl.build_model(env, vdn, {"core_arch": "gru", "encode_layer": "128-256"})

    vdn.fit(env, model, stop={"training_iteration": 100}, local_mode=False, num_gpus=1, num_workers=2, share_policy="group")

About the same as here, except that only seven iterations were run here as a test, which is of course too few. The same shows up however also if I have 10.000 iterations. In other scenarios, the reward is also not only between -1 and 0.

+-----------------------------------------------------------+----------+----------------------+--------+------------------+------+-----------+----------------------+----------------------+--------------------+
| Trial name                                                | status   | loc                  |   iter |   total time (s) |   ts |    reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|-----------------------------------------------------------+----------+----------------------+--------+------------------+------+-----------+----------------------+----------------------+--------------------|
| VDN_grouped_pommerman_PommeTeamCompetition-v0_c47d5_00000 | RUNNING  | 192.168.178.45:13767 |      7 |          22.2609 | 9784 | -0.428571 |                    0 |                   -2 |            232.952 |
+-----------------------------------------------------------+----------+----------------------+--------+------------------+------+-----------+----------------------+----------------------+--------------------+

Any idea what the reason could be?

Theohhhu commented 1 year ago

Thank you for your interest in our work. If the reward never becomes positive, there could be several potential reasons to consider:

  1. The task may be too difficult for the current approach.
  2. The algorithm used may not be fine-tuned or suitable for the task.
  3. The reward design might be inadequate or ineffective.

Based on my understanding, I believe the issue could be attributed to either point 1 or point 2. Previous research on Pommerman has rarely relied solely on pure reinforcement learning (RL). Instead, they often incorporate explicit or implicit signals to guide the learning process.

To further explore and address this challenge, you may find some valuable resources and references in the following link: Pommerman Resources. These resources can provide additional insights and guidance in enhancing your approach.