Log for mediator's probs

Hi Dmitry, I hope this message finds you well!

I've been reading your paper, Mediated Multi-Agent Reinforcement Learning. The idea is fascinating, and I'm currently attempting to reproduce the results using the code provided on GitHub. However, I've encountered a discrepancy between the logs for the mediator and the agents. Specifically, the action mapping seems inconsistent: for the environmental agents, action 0 is "defect," 1 is "cooperate," and 2 is "commit." Meanwhile, for the mediator, action 0 is "cooperate," and 1 is "defect," which appears to be opposite to the agents.

In the controller.py file, I noticed that the mediator's moves are set to correspond to the environmental agents' moves when they choose to commit, as indicated by actions_to_env[i] = actions_mediator[i].

Could there be an issue with the logging, or is there a misunderstanding on my part? I would appreciate your clarification on this matter.

Thank you very much!

dimonenka / mediatedMARL

Log for mediator's probs #1