ScaleMind-C9308A / some

0 stars 0 forks source link

Problems of MARL - PPO: #12

Open skydvn opened 1 year ago

skydvn commented 1 year ago
  1. We have not insert the training of Critic for PPO.
  2. We should split the training of Actor and Critic separately, to be more specific:
    • The Actor will use the advantage loss.
    • The Critic will use the MSE loss between infered Q from Critic + true Q