ppo-pytorch Search Results

1000+ results
for ppo-pytorch

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

DLR-RM/stable-baselines3 #528

[Question] PPO rollout with numsteps > episode length

### What does it mean when we roll out PPO with numsteps > episode length I know from the code that it will recycle the environment after you pass the terminal timestep. The question that I have is…

rhelpacc updated 3 years ago
2
HumanCompatibleAI/imitation #230

Repopulate S3 with Torch expert policies for `experiments/do…

I think it's still TF experts right now (incompatible with our repo since torch port). Addresses part of #215.

shwang updated 3 years ago
10
thu-ml/tianshou #338

Plans of implementing 3 classic model-free algorithm (TRPO/T…

## Purpose The purpose of this issue(discussion) is to introduce a series of prs in the near future targeted to releasing tianshou's full benchmark for MuJoCo Gym task suite. This benchmark will inc…

ChenDRAG updated 3 years ago
1
openai/spinningup #343

ExperimentGrid fail, FileNotFound error

Hello. I'm just getting started with SpinningUp and have encountered an issue when I try to run ExperimentGrid. Full disclosure: I'm running Windows and I followed the instructions linked on the spinn…

alanballard updated 3 years ago
1
thu-ml/tianshou #414

test reward never change when implement PPO on Acrobot-v1

I failed to train some PPO agents on Acrobot-v1, the test reward never change. It stays at -500. My code is same as test/discrete/test_ppo, except the env is Acrobot-v1. Also, when I use a custom ac…

liuxiongchang updated 3 years ago
4
ray-project/ray #4498

[rllib] Slowly running out of memory in eager + tracing

### System information - **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**: Linux Ubuntu 16.04 - **Ray installed from (source or binary)**: source - **Ray version**: 0.6.5 - **Python…

opherlieber updated 3 years ago
22
ray-project/ray #16715

[rllib] Training crashes because get_gpu_ids() returns empty…

### What is the problem? When running a simple RLlib training script, almost identical to the example [here](https://docs.ray.io/en/master/rllib-training.html#basic-python-api), I get the follo…

cassidylaidlaw updated 3 years ago
14
DLR-RM/stable-baselines3 #647

[Bug] Monitor writes results of multiple envs to the log sim…

**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email. Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [R…

Abermal updated 2 years ago
2
DLR-RM/stable-baselines3 #473

[Question] Custom Evaluation Callback for PPO

### Question Hello, I am writing a custom evaluation callback for ppo (mostly based on SB3 evaluation callback). Since PPO uses VecNormalize for the training envs, how could I pass the statistics f…

hai-h-nguyen updated 3 years ago
1
DLR-RM/stable-baselines3 #485

[Question] Justifying advantage normalization for PPO

### Question For PPO, I understand that advantage normalization (for each batch of experiences) is sort of a standard practice. I've seen other implementations do it, too. However, I find it a litt…

zhihanyang2022 updated 3 years ago
6

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for ppo-pytorch

1000+ results
for ppo-pytorch