cyanrain7 / TRPO-in-MARL

MIT License
185 stars 49 forks source link

Some questions about HAPPO implementation #10

Closed sachiel321 closed 2 years ago

sachiel321 commented 2 years ago

45e703d34e3718481a732fbc49b7412 The above is the equation(10) in the paper, but I can't find it in the current implementation.

In the file 737eb5d5d035c88f44ec05ad85a9b1d

And baa6d02351ad0e8c4ebeecfcf7962eb

I can not find any preprocessing to advantages like equation (10) in your paper.

I would appreciate knowing how iterative updates in the algorithm1 are represented in the code.

sachiel321 commented 2 years ago

I figured it out... Your perpare each agent a trainer in .

GoingMyWay commented 2 years ago

@sachiel321 Hi, I cannot find the implementation either. Can you point out where I can find it?