-
hello, what a great work you've done!
But i found something wrong in onpolicy/onpolicy/runner/shared/grid_runner.py.
![Screenshot from 2024-07-09 16-02-49](https://github.com/efc-robot/Explore-Bench…
-
Dear author, I am very interested in your work, may I ask how long you run an experiment?
-
I am confused by your code.
In the paper, it is mentioned that a policy gradient method [1] is used. But more specifically, I think that is implemented by Actor-Critic.
If I am wrong, plz tell m…
-
Several Deep RL agents that are missing such as A2C, A3C which can be added. Further work could also adding MARL agents such as MAA2C or MADDPG
-
Dear Simar et al.,
First of all, I would like to thank you for your research. I believe it is very well done and deserves to be studied carefully to learn from your perspectives, methods, and insig…
-
Thanks for reply, I have been busy at another project last few days, recently I get spare time.
I have noticed that at comm_net, the variables of communication part(maybe along with encoder part) a…
-
It seems that you update critic before actor.
As far as I know, the actor_loss is calculated through critic network, so the backward of actor_loss will influence the grad of critic parameters.
S…
-
```python
import argparse
import datetime
import os
import sys
import pprint
import numpy as np
import torch
# Add the parent directory to the system path
sys.path.append('..')
from ti…
-
在执行:
results = DRLAgent.DRL_prediction_load_from_file(model_name='maesac',environment=test_trade_gym, cwd=model_path)
的时候报错:
RuntimeError: Error(s) in loading state_dict for SACPolicy:
size misma…
-
In this example https://github.com/keras-team/keras-io/blob/master/examples/rl/actor_critic_cartpole.py, the gradient for the actor is defined as the gradient of loss $L = \sum \ln\pi (reward-value)$.…