-
I encountered a Problem with implementation of the MO_PPO.
There was a mismatch in dimensions of the _reward vector in the sync_vector_env environment.
I worked around the issues by extending th…
-
### 🐛 Bug
Hi
I switched from PPO to MaskablePPO and since then I'm facing problem with error. Interesting fact is that this error doesn't occur immediately, it occurs after 100k-300k timesteps. Ve…
-
## Description
We are launching a `rosbag2_py.Recorder()` instance inside of Python, and related to #1458, we do not want this to affect any surrounding control flow.
Additionally, for now (perhap…
-
### What happened + What you expected to happen
I want to train a PPO agent in my custom environment called RankingEnv, but I'm encountering several errors and warnings that result in the agent's t…
-
### ❓ Question
Hello,
I am learning how to implement the costum CNN policy and environment with the stablebaseline 3. I am following the example "Custom Feature Extractor" in this link:
https://s…
-
The source code for the `step` function of `gymnasium.wrappers.time_limt.TimeLimit` is [as follows](https://github.com/openai/gym/blob/dcd185843a62953e27c2d54dc8c2d647d604b635/gym/wrappers/time_limit…
-
### ❓ Question
The question seems complicated, but it is not.
Given the following min example:
```
import gymnasium as gym
from gymnasium import spaces
from stable_baselines3.common.env_ut…
-
I ran the example given
```
import os
os.environ["WANDB_DISABLED"] = "true"
!python examples/nlg-reddit/sample-level-dp/fine-tune-dp.py \
--output_dir scratch \
--model_name sshleifer/tiny-gpt2 …
-
### 🐛 Bug
When log info to tensorboard, `self.logger.dump(step=self.num_timesteps)` is called after `self.logger.record`
```python
self.logger.record("time/iterations", iteration, exclude="tensor…
-
Amazing work! However, I encountered some problems while using it.
The first problem is, if I open more than one environment, i.e. num_env > 1, is it unreasonable for visualization?
The code below…