Rewards is always negative. But never becomes positive

First of all, thank you for the project. I'm just a little bit testing with it, I noticed if I extend the example from you a little bit and use this Pommerman environment with e.g. the algorithms QMIX and then change Pommerman to Teammatch, that the rewards never go above 0 no matter how many episodes I train.

    env = marl.make_env(environment_name='pommerman', map_name='PommeTeamCompetition-v0')
    vdn = marl.algos.qmix(hyperparam_source="test")
    model = marl.build_model(env, vdn, {"core_arch": "gru", "encode_layer": "128-256"})

    vdn.fit(env, model, stop={"training_iteration": 100}, local_mode=False, num_gpus=1, num_workers=2, share_policy="group")

About the same as here, except that only seven iterations were run here as a test, which is of course too few. The same shows up however also if I have 10.000 iterations. In other scenarios, the reward is also not only between -1 and 0.

+-----------------------------------------------------------+----------+----------------------+--------+------------------+------+-----------+----------------------+----------------------+--------------------+
| Trial name                                                | status   | loc                  |   iter |   total time (s) |   ts |    reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|-----------------------------------------------------------+----------+----------------------+--------+------------------+------+-----------+----------------------+----------------------+--------------------|
| VDN_grouped_pommerman_PommeTeamCompetition-v0_c47d5_00000 | RUNNING  | 192.168.178.45:13767 |      7 |          22.2609 | 9784 | -0.428571 |                    0 |                   -2 |            232.952 |
+-----------------------------------------------------------+----------+----------------------+--------+------------------+------+-----------+----------------------+----------------------+--------------------+

Any idea what the reason could be?

Thank you for your interest in our work. If the reward never becomes positive, there could be several potential reasons to consider:

The task may be too difficult for the current approach.
The algorithm used may not be fine-tuned or suitable for the task.
The reward design might be inadequate or ineffective.

Based on my understanding, I believe the issue could be attributed to either point 1 or point 2. Previous research on Pommerman has rarely relied solely on pure reinforcement learning (RL). Instead, they often incorporate explicit or implicit signals to guide the learning process.

To further explore and address this challenge, you may find some valuable resources and references in the following link: Pommerman Resources. These resources can provide additional insights and guidance in enhancing your approach.

Replicable-MARL / MARLlib

Rewards is always negative. But never becomes positive #143