-
Hi, I am new to tianshou and RL. I created a env and used ppo in tianshou to run. But I found the action sampling is out of range. So I searched for, and I found map_action. But it seem not used in tr…
-
Error info:
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/hybrid_engine.py", line 99, in new_inference_container
File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_…
-
训练配置如下:
```
--ref_num_nodes 1 --ref_num_gpus_per_node 2 --reward_num_nodes 1 --reward_num_gpus_per_node 2 --critic_num_nodes 1 --critic_num_gpus_per_node 4 --actor_num_nodes 2 --actor_num_gpus_per_n…
-
Hi team getting the following error while enabling 4-bit and LORA
```
File "/root/miniconda3/envs/open/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 262, in __init__
self._c…
-
Currently, `week5_policy_based/practice_a3c.ipynb` has numerous problems.
* It does not implement A3C. It is a plain actor-critic.
* We only have it in Tensorflow, since it does not have a corresp…
-
I am conducting reinforcement learning for a robot using rsl_rl and isaac lab. While it works fine with simple settings, when I switch to more complex settings (such as Domain Randomization), the foll…
-
请问在PPO_model.py 文件里,forward 是空的,为什么可以通过evaluate 函数实现呢? 实在没搞懂 这样的话HGNNScheduler 网络里的 actor 和 critic 是怎么训练的?
evaluate 函数,里面使用了 actor 和 critic ,那actor 网络的含义是什么啊? 初始化的输出只有1维,如何输出 action的 分布呢? 是通过里面的…
-
The actor loss is defined as ([Line 164](https://github.com/openai/baselines/blob/699919f1cf2527b184f4445a3758a773f333a1ba/baselines/ddpg/ddpg.py#L164)):
```python
self.actor_loss = -tf.reduce_mea…
-
There are recurrent (LSTM) policy options for sb3 (e.g. [RecurrentPPO](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/sb3_contrib/ppo_recurrent/ppo_recurrent.py)). It w…
-
Hello I am trying to use the SAC agent and resume training, to do that I do:
```
def load_model(actor_path, critic_path, optimizer_actor_path, optimizer_critic_path, optimizer_alpha_path):
po…