-
> ```
> # Copyright (c) Microsoft Corporation.
> # SPDX-License-Identifier: Apache-2.0
>
> # DeepSpeed Team
>
>
> ACTOR_ZERO_STAGE="--actor_zero_stage 0"
> CRITIC_ZERO_STAGE="--critic_zero_…
-
(pytorch) ➜ srcs git:(master) ✗ python learn_by_ppo.py
0%| | 0/11871 [00:00
-
train.py 和config_ppo.yaml 中low_level_load_path是如何生成的
evaluate.py中设置了lower_model和upper_model
报错Encoder type cnn not supported!
4个upper_model都试过了加载后的encoder_type是cnn而非pixel
有更详细的训练或者验证介绍嘛
-
Dear ronsailer,
I'm very sorry to trouble you. First, thanks for your contribution, and I am running the code on Pong and can not get a better result. So I want to ask whether you do this experiment.…
-
Code to reproduce:
```python
import trl
from unsloth import FastLanguageModel
import torch
from tqdm import tqdm
from transformers import AutoTokenizer
from datasets import load_dataset
fr…
-
Hi Lucas,
I've been working on my 3D indoor environment. It's still very basic, but it works, and I just made the repository public: https://github.com/maximecb/gym-miniworld
I've tried to adjus…
-
**Describe the issue**:
Hi,
Could you please add the ppo_tuner performance in comparison of hpo algorithms:
https://nni.readthedocs.io/en/latest/sharings/hpo_comparison.html
Thanks a lot!
…
-
不使用double policy的请看下,你的代码可以收敛么?我听别人说直接使用tanh再distribution 后sample会影响熵的计算,不知道为什么,可以问下么
-
Running into the issue of `libmem_filesys.so: cannot open shared object file`. I tried googling but could not find any info on this file
Additionally, any chance the Preview 3 and prior version…
-
### What happened + What you expected to happen
I want to use an environment with an observation space of 2 dimensions with the new API stack but I'm unable to do so as the `_get_encoder_config` me…