actor-critic Search Results

1000+ results
for actor-critic

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

emad-arezoomand/online-actor-critic-algorithm-to-solve-continues-time-infinite-horizon-optimal-control-problem #1

online-actor-critic-neural-net-optimal-controler problem

When the learning rate or initial critic weights change, why the critic weights converge to different values? How to justify its optimality?

HappyTiger1 updated 2 years ago
6
rl-2023/rl-2023-final-project #2

Agents don't learn

Even after a bigger run, agents don't learn: according to the pressurplate we have a reward in [-0.9,0] if the agent is in the same room of the assigned plate and reward [-1,...,-N] otherwise. I tri…

MicheleMusacchio updated 1 month ago
2
ikostrikov/rlpd #6

Flax FrozenDict: dict.copy() takes no keyword arguments

### Reproduce error - flax 0.7.5 - jaxlib 0.4.21+cuda12.cudnn89 - Ubuntu 22.04 Running ```bash XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning_pixels.py --env_name=cheetah-run-v0…

BurgerAndreas updated 4 months ago
1
DLR-RM/stable-baselines3 #338

[Feature Request] independently configurable learning rates …

### 🚀 Feature independently configurable learning rates for actor and critic in AC-style algorithms ### Motivation In literature the actor is often configured to learn slower, such that the c…

stheid updated 6 months ago
11
arXivTimes/arXivTimes #701

Actor-Critic based Training Framework for Abstractive Summar…

## 一言でいうと Actor-Criticを使った強化学習で生成型要約を行う要約文の質を上げるために、Criticの1つとして文がモデルによって生成された文か人間の文かを2値分類するモデルを使うのが特徴これにより生成される文にOOVや###.#といったのノイズが入りにくくなる ### 論文リンク https://arxiv.org/abs/1803.11070 ###…

ymym3412 updated 6 years ago
2
TesfayZ/CCM_MADRL_MEC #4

Dear author, I have some questions about this code.

In line 276 of CCM_MADDPG.py, I wonder why " newactor_action_var = self.actors[agent_id](states_var[:, agent_id, :]" instead of "newactor_action_var = self.actors[agent_id](next_states_var[:, agent_id…

SuperLuckyStar666 updated 1 week ago
14
clearlab-sustech/Wheel-Legged-Gym #3

VAE的作用

大佬您好: 我想请教一下vae在actor-critic网络中间起到的作用是什么,去掉之后会怎么样?

Li-Jinyuan updated 1 month ago
1
dennybritz/reinforcement-learning #180

Batch update for Continuous Mountain Car Actor-Critic

In https://github.com/dennybritz/reinforcement-learning/blob/master/PolicyGradient/Continuous%20MountainCar%20Actor%20Critic%20Solution.ipynb, I found every time step, the actor and value function a…

GoingMyWay updated 2 years ago
1
microsoft/DeepSpeedExamples #817

step3 use same memory when I increase GPUs

when I use 4 * A100 80G to run step3 with llama2-7b(actor_model) tiny-llama-1.1B(ref_model)，it will used 53848MB memory in generation and in training used 79610MB memory . when I use 8 * A100 80G to …

Little-rookie-ee updated 7 months ago
1
OpenRLHF/OpenRLHF #246

Unexpected long actor_time when train_ppo_ray

训练配置如下： ``` --ref_num_nodes 1 --ref_num_gpus_per_node 2 --reward_num_nodes 1 --reward_num_gpus_per_node 2 --critic_num_nodes 1 --critic_num_gpus_per_node 4 --actor_num_nodes 2 --actor_num_gpus_per_n…

LSC527 updated 4 months ago
9

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for actor-critic

1000+ results
for actor-critic