ppo Search Results - Githubissues

1000+ results
for ppo

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Unity-Technologies/ml-agents #6142

How can I get 'std of Reward' in POCA

there are 'mean reward' and 'std of reward' in PPO but 'mean reward' and 'mean group reward' in POCA I didn`t find the way to get 'std of reward' in POCA I`ll appreciate if it can be get in YAML bu…

wanghang2000221 updated 3 days ago
1
shibing624/MedicalGPT #420

多gpu 的时候运行ppo_training.py报错，

if self.is_encoder_decoder: input_ids = input_kwargs["decoder_input_ids"] attention_mask = input_kwargs["decoder_attention_mask"] else: …

cqray1990 updated 1 month ago
1
MinecraftForge/MinecraftForge #10054

[1.21.0] TntBlock explode method wrongly deprecated

**Minecraft Version:** 1.21.0 **Forge Version:** 51.0.8 **Description of issue:** Explode method in TntBlock class is marked with Deprecated attribute: ``` @Deprecated //Forge: Prefer…

MinecraftBA updated 2 months ago
3
utiasDSL/safe-control-gym #165

MSE computation issue

In the RL training pipeline (for SAC and PPO), during evaluation runs, there seems to be an issue with computed/tracked mse values. They neither match with mse in "info" from env.step nor with rmse re…

svsawant updated 1 week ago
2
lucasfbn/Reddit-Sentiment-Reinforcement-Learning #157

Tune PPO

lucasfbn updated 2 years ago
1
Kitsunp/Simplifited_kistmath_ai #1

Kistmat_AI Refactoring and Enhancement

Refactorizar y mejorar el modelo Kistmat_AI, incluyendo la reorganización del código, implementación de tests, mejora del razonamiento simbólico, integración de PPO, y adición de nuevos sistemas de me…

Kitsunp updated 1 month ago
9
vwxyzjn/cleanrl #198

PPO timeout proper handling

Hi! I'm a bit puzzled as to how a timeout could be handled correctly in your implementation of PPO (well, this is relevant for all variants). I am especially surprised by envpool, because seems like t…

Howuhh updated 5 months ago
12
huggingface/trl #2097

Supports of PPOTrainer / DPOTrainer for Qwen2Audio

### Feature request Enable PPOTrainer and DPOTrainer to work with audio-language models like Qwen2Audio. Architecture for this model is identical to vision-language models like LlaVa, consisting of…

jonflynng updated 21 hours ago
2
pytorch/rl #2403

[BUG] remove error catches (try/except) in objectives

We still have a bunch of try/except in losses such as PPO to compute the entropy. We need to remove them for compile compatibility.

vmoens updated 3 weeks ago
2
HiddenBeginner/Deep-Reinforcement-Learnings #3

Deep-Reinforcement-Learnings/book/Chapter2/12-implementation…

# 17. PPO 구현 — 심층강화학습 [https://hiddenbeginner.github.io/Deep-Reinforcement-Learnings/book/Chapter2/12-implementation-ppo.html](https://hiddenbeginner.github.io/Deep-Reinforcement-Learnings/book/Cha…

utterances-bot updated 2 months ago
4

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for ppo

1000+ results
for ppo