ppo Search Results - Githubissues

1000+ results
for ppo

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

datawhalechina/easy-rl #73

PPO的Critic输入加入Action

PPO的代码中如果将action加入到critic网络的输入向量中，只将action作为一维直接加入，训练效果很差；之后将state的四个值归一化处理后效果还是不好。请问在critic网络输入加入action后对输入怎么处理能使训练得到较好的效果？感谢作者提供的代码

Hanbinbinbin updated 2 months ago
1
pytorch/rl #2403

[BUG] remove error catches (try/except) in objectives

We still have a bunch of try/except in losses such as PPO to compute the entropy. We need to remove them for compile compatibility.

vmoens updated 1 month ago
2
chenhongge/StateAdvDRL #3

Question about SA-PPO

May I ask when the code of SA-PPO will be released? Thank you!

didizhu-zju updated 4 years ago
3
JasonYao81000/ADL2019 #1

PPO squared-error loss

想請問下PPO value loss的計算方法，因為PPO paper上好像沒定義很清楚squared-error loss的計算( 有點不太能理解？) ![](https://i.imgur.com/AsrzJYl.png) 我發現你在 ADL, MLDS 兩邊的PPO squared-error loss 計算方式不大一樣，然後我看了網路上很多寫法(tensorlayer等等）也都不大…

tommyvsfu1 updated 5 years ago
1
utiasDSL/safe-control-gym #165

MSE computation issue

In the RL training pipeline (for SAC and PPO), during evaluation runs, there seems to be an issue with computed/tracked mse values. They neither match with mse in "info" from env.step nor with rmse re…

svsawant updated 1 month ago
2
datawhalechina/easy-rl #80

total_loss = actor_loss + 0.5*critic_loss? PPO中actor与critic…

请问，在PPO代码的agent.py 文件，为啥要算total_loss = actor_loss + 0.5*critic_loss? PPO讲解中未见分析欸，而且 PPO原文中也未看到相关操作。另外，为什么AC网络均使用total_loss的梯度, 这个地方合理吗？？？

CeibaSheep updated 2 months ago
3
huggingface/trl #2097

Supports of PPOTrainer / DPOTrainer for Qwen2Audio

### Feature request Enable PPOTrainer and DPOTrainer to work with audio-language models like Qwen2Audio. Architecture for this model is identical to vision-language models like LlaVa, consisting of…

jonflynng updated 3 weeks ago
2
intel-analytics/ipex-llm #10854

RuntimeError: "fused_dropout" not implemented for 'Byte' whe…

**Machine: MAX1100** **ipex-llm: 2.1.0b20240421** **bigdl-core-xe-21 2.5.0b20240421 bigdl-core-xe-esimd-21 2.5.0b20240421** [Related PR](https://github.com/intel-analytics/ipex-llm…

Jasonzzt updated 6 months ago
3
ray-project/ray #40777

[RLLib] Document how to change Algorithm configuration when …

### Description I'm trying to restore an RLLib algorithm from a checkpoint and change the configuration before resuming training. My main objective is to change the number of rollout workers between …

kronion updated 2 months ago
3
LucasAlegre/sumo-rl #149

trial error in experiment ppo_4x4grid

Hi . I'm trying to simulate experiment ppo_4x4grid and I had fixed many errors before but now I cant understand which are the errors here and how can I fix them. I will be so thankful if anyone can he…

mas-kho updated 1 year ago
1

上一页 1...12 13 14 15 16 17 18...100 下一页

1000+ results for ppo

1000+ results
for ppo