-
The current implementation of `ActorCriticBase` makes it a bit trick to have custom actor and critic networks that have shared layers. This is because the instantiation of the networks happen in the `…
-
### Please describe the purpose of the feature. Is it related to a problem?
I am inquiring about possibly integrating JAX-based Graph Neural Networks (GNNs) into MAVA for use in MARL. Many MARL algor…
-
Baseline PPO agent:
- Critic represents total reward
- Actor is trained to maximize critic
CBF PPO agent:
- Base critic represents nominal reward
- CBF critic represents safety reward
- Actor…
-
Thanks for the paper, it is really cool and useful
On page 22 of the paper, it says
> For reincarnating D4PG using QDagger, we minimize a distillation loss between the D4PG’s actor policy and the …
-
Here's a nice actor-critic reinforcement learning model that would be fun to re-implement in Nengo (and try different learning rules)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journ…
-
### What happened + What you expected to happen
# What happened
Using the `Algorithm.add_module` with a `module_state` does not use the module state, but instead loads or builds the module directly…
-
Implement and explore the effectiveness of actor critic agent.
-
Stock Dimension: 30, State Space: 2371
{'batch_size': 64, 'buffer_size': 100000, 'learning_rate': 0.001, 'learning_starts': 100, 'ent_coef': 'auto_0.1'}
Using cpu device
Logging to /content/drive…
-
Mr Siraskar,
Hello ! I have read your paper and your code, which are very helpful to me, thank you for sharing your valuable work to us !
I have a question :
Why you designed the actor network …
-
It occurred to me that this recent paper is an interesting one to implement inside brax
One of the cool things about brax is its differentiability, but as I understand it, attempt to leverage that …