actor-critic Search Results

THUDM/WebRL #7

Where are policy_lm and critic_lm?

scripts/config/main/webrl.yaml: defaults: - default - _self_ save_path: /workspace/WebRL/scripts/output run_name: "webrl" critic_lm# training policy_lm: /workspace/WebRL/webrl-glm-4-9…

zhengshf updated 2 days ago

OpenRLHF/OpenRLHF #230

Actor-Critic-Model

If I understand the current PPO code correctly, this instantiates completely separate actor and critic models, without any layers shared between them. (But correct me in case that is wrong?) Instea…

mgerstgrasser updated 2 months ago

pytorch/xla #8180

Model support for `soft_actor_critic` with Torch_XLA2

## Fix the model test for `soft_actor_critic.py` 1. setup env according to [Run a model under torch_xla2](https://github.com/pytorch/xla/blob/master/experimental/torch_xla2/docs/support_a_new_model…

ManfeiBai updated 2 weeks ago

AgileRL/AgileRL #275

Multi-Output Network for Custom PPO Agent

I'm working on a custom PPO agent where the actor learns both the mean and variance of the action distribution. To implement this, I've overridden the `get_action` method and modified the actor's `for…

DKarz updated 2 days ago

tensorflow/tensorflow #73404

Convergence of Actor critic algorthim

### Issue type Bug ### Have you reproduced the bug with TensorFlow Nightly? Yes ### Source source ### TensorFlow version V1 ### Custom code Yes ### OS platform and distribution _No response…

AhdHazim updated 3 months ago

UoA-CARES/cares_reinforcement_learning #165

Generalise Actor/Critic with MLP in common.py

Reduce duplication of similar Actors/Critics with only the hidden layers being different - generally improve the readability of the code for creating the networks.

beardyFace updated 2 weeks ago

RPegoud/jym #23

Actor-Critic Agent

Hi, thanks for the amazing work of RL environments using JAX. I was wondering if you have any plans to write Actor-Critic agents for this work?

AbhiDu96 updated 7 months ago

google/flax #4391

Issue with Optimizer Update in A2C Network with Optax Body:

Hello everyone, I've encountered a problem while implementing an A2C (Advantage Actor-Critic) network involving Flax and Optax. My network includes _policy_network_ and _value_network_, each containi…

Tomato-toast updated 1 day ago

facebookresearch/fairseq #3386

Actor-Critic NMT

Fairseq contains many NMT models but models with Reinforcement Learning are absent. It would be great if that is added

babangain updated 3 years ago

abhisheknaik96/continuing-rl-exps #1

There is something wrong in Configuration , about EnvName.

'exp_name': 'puckworld1D_cont_CDiscDDPG', 'env_name': 'puckworld_continuous_1d', 'env_type': 'csuite', 'agent_name': 'CD_DDPG', 'nonlinear': True, 'exp_type': 'control', 'render': False, 'num_runs': 5…

zhrli updated 3 weeks ago

1000+ results for actor-critic

1000+ results
for actor-critic