The algorithm can not be converged.

iddbh commented 4 months ago

My environment is Linux, Python3.8. I find the algorithm will be trapped in some condition. It's strange and I run the same code. And the actions chosen by models always are same. The total reward is always about -128. I could not find the reason.

deligentfool commented 4 months ago

Thank you for your interest in the project. If the situation you mentioned happens occasionally, I have actually noticed this issue before. My final guess is that it might be due to insufficient exploration in the original code. The original implementation directly selects actions by taking the maximum value, as shown below: https://github.com/mlii/mfrl/blob/d9a2dbca6f50687a2d2c2f0d613dc57b4cc4f9a0/examples/battle_model/algo/base.py#L45 https://github.com/mlii/mfrl/blob/d9a2dbca6f50687a2d2c2f0d613dc57b4cc4f9a0/examples/battle_model/algo/base.py#L136

self.predict = tf.nn.softmax(self.e_q / self.temperature)
actions = self.sess.run(self.predict, feed_dict=feed_dict)
actions = np.argmax(actions, axis=1).astype(np.int32)

In my opinion, this implementation results in the temperature coefficient not having any effect, meaning that regardless of the temperature coefficient's value, it will not affect the outcome.

So, when implementing the PyTorch version of mfrl, I added two lines of code, as shown below: https://github.com/deligentfool/mfrl_pytorch/blob/ea91c7851adbce8a31db35c1ac830692d83361e1/algo/base.py#L103-L104 Uncommenting these two lines of code will significantly alleviate the issue you mentioned. However, to stay consistent with the original code, I ultimately decided to keep these two lines commented out. I have seen similar issues raised in GitHub repositories related to other variations of mfrl. Someone has expressed opinions similar to mine.

Since this code was written three years ago, and I haven not been closely following mean-field-related work for some time, I would suggest that you personally carefully compare the original code with my implementation in PyTorch. When it comes to issues with mfrl implementations, directly contacting the authors of original paper is often the quickest way to resolve them.

If you obtain a convincing answer, please be sure to let me know, even though I haven not delved into the mfrl domain since implementing this code.

iddbh commented 4 months ago

Thank you very much! I noticed the temperature problem before, and I set a constant temperature and got the same result. I will follow your instructions to edit the code.

deligentfool commented 4 months ago

I'm going to close this issue now. Feel free to reach out again if you have any further questions or need assistance.

deligentfool / mfrl_pytorch

The algorithm can not be converged. #2