The Script code runs wrong when applying the HATRPO algorithm with 【rnn】 network.

Hello, I try to run your code with the hatrpo algorithm and 【rnn】 network. Specially, I add "--use_recurrent_policy" in both scripts: train_smac.sh and train_mujoco, modify the algo='hatrpo'. However, both the scripts code go wrong and return errors as below:

RuntimeError: the derivative for '_cudnn_rnn_backward' is not implemented. Double backwards is not supported for CuDNN RNNs due to limitations in the CuDNN API. To run double backwards, please disable the CuDNN backend temporarily while running the forward pass of your RNN. For example: with torch.backends.cudnn.flags(enabled=False): output = model(inputs)

cyanrain7 / TRPO-in-MARL

The Script code runs wrong when applying the HATRPO algorithm with 【rnn】 network. #14