-
Right now, only hyperparmeters that are searched by default can have their params dict be copied and reused due to naming issues. This should be extended to hyperparameters that are not searched by de…
-
I tried your script in mountaincar env and It seems that the game ends when the step length reaches 200 per episode, but in your tensorboard plots, an episode didn't stop until it reached the final st…
-
I am attempting to access the Atari environments, and upon importing the latest versions of ale-py, autorom, gym, gymnasium even, I get the following error when attempting to make and environment of a…
-
Hi all, I have run DDPG with default hyperparameters in mujoco swimmer-v2 environment, but the reward converges to a very low value, only 4 or 5, so the swimmer cannot swim at all. I did not change th…
-
Is this not a flawed approach since you have fundamentally changed the problem by altering the reward? To compare to other solutions the environment should be the same (including rewards given)
-
### 🚀 Feature
Hi!
I would like to implement a recurrent soft actor-critic. Is it a sensible contribution?
### Motivation
I actually need this algorithm in my projects.
### Pitch
The sb3 e…
-
This is the figure I'm referring to:
-
### ❓ Question
Consider this setup:
```python
import stable_baselines3
import gym
from stable_baselines3 import DQN, A2C, PPO
#from sb3_contrib import ARS, TRPO
env = gym.make('MountainCar-v0…
-
Currently the way to train an agent is to 1) Instantiate the environment, 2) Instantiate the agent, passing the environment in constructor and 3) calling the learn method.
Some agent frameworks hav…
-
Hey, thanks for providing purejaxrl is pretty awesome.
I have used the experimental `S5` code that you provide for a part of my research and after version 0.4.27 (same for 0.4.28) of `jaxlib` I hav…