Closed A0ce closed 1 year ago
Thank you for your interest in our work. If the reward never becomes positive, there could be several potential reasons to consider:
Based on my understanding, I believe the issue could be attributed to either point 1 or point 2. Previous research on Pommerman has rarely relied solely on pure reinforcement learning (RL). Instead, they often incorporate explicit or implicit signals to guide the learning process.
To further explore and address this challenge, you may find some valuable resources and references in the following link: Pommerman Resources. These resources can provide additional insights and guidance in enhancing your approach.
First of all, thank you for the project. I'm just a little bit testing with it, I noticed if I extend the example from you a little bit and use this Pommerman environment with e.g. the algorithms QMIX and then change Pommerman to Teammatch, that the rewards never go above 0 no matter how many episodes I train.
About the same as here, except that only seven iterations were run here as a test, which is of course too few. The same shows up however also if I have 10.000 iterations. In other scenarios, the reward is also not only between -1 and 0.
Any idea what the reason could be?