issues
search
ScaleMind-C9308A
/
some
0
stars
0
forks
source link
Problems of MARL - PPO:
#12
Open
skydvn
opened
1 year ago
skydvn
commented
1 year ago
We have not insert the training of Critic for PPO.
We should split the training of Actor and Critic separately, to be more specific:
The Actor will use the advantage loss.
The Critic will use the MSE loss between infered Q from Critic + true Q