Hello. Thanks for your work. I tried to use MARLlib for training in my custom environment with different algorithms. It seems that if start training with HATRPO immediately after HAPPO "RuntimeError: Function 'MulBackward0' returned nan values in its 0th output" error will occur. The following code can be used to reproduce error:
Hello. Thanks for your work. I tried to use MARLlib for training in my custom environment with different algorithms. It seems that if start training with HATRPO immediately after HAPPO "RuntimeError: Function 'MulBackward0' returned nan values in its 0th output" error will occur. The following code can be used to reproduce error:
Also attaching full log. Installation of MARLlib were made with conda, no GPU used, launched in local mode and not reproduce if local_mode=False.