-
PPO的代码中如果将action加入到critic网络的输入向量中,只将action作为一维直接加入,训练效果很差;之后将state的四个值归一化处理后效果还是不好。
请问在critic网络输入加入action后对输入怎么处理能使训练得到较好的效果?
感谢作者提供的代码
-
We still have a bunch of try/except in losses such as PPO to compute the entropy.
We need to remove them for compile compatibility.
-
May I ask when the code of SA-PPO will be released?
Thank you!
-
想請問下PPO value loss的計算方法,因為PPO paper上好像沒定義很清楚squared-error loss的計算( 有點不太能理解?)
![](https://i.imgur.com/AsrzJYl.png)
我發現你在 ADL, MLDS 兩邊的PPO squared-error loss 計算方式不大一樣,然後我看了網路上很多寫法(tensorlayer等等)也都不大…
-
In the RL training pipeline (for SAC and PPO), during evaluation runs, there seems to be an issue with computed/tracked mse values. They neither match with mse in "info" from env.step nor with rmse re…
-
请问,在PPO代码的agent.py 文件,
为啥要算total_loss = actor_loss + 0.5*critic_loss? PPO讲解中未见分析欸,而且 PPO原文中也未看到相关操作。
另外,为什么AC网络均使用total_loss的梯度, 这个地方合理吗???
-
### Feature request
Enable PPOTrainer and DPOTrainer to work with audio-language models like Qwen2Audio. Architecture for this model is identical to vision-language models like LlaVa, consisting of…
-
**Machine: MAX1100**
**ipex-llm: 2.1.0b20240421**
**bigdl-core-xe-21 2.5.0b20240421
bigdl-core-xe-esimd-21 2.5.0b20240421**
[Related PR](https://github.com/intel-analytics/ipex-llm…
-
### Description
I'm trying to restore an RLLib algorithm from a checkpoint and change the configuration before resuming training. My main objective is to change the number of rollout workers between …
-
Hi . I'm trying to simulate experiment ppo_4x4grid and I had fixed many errors before but now I cant understand which are the errors here and how can I fix them. I will be so thankful if anyone can he…