-
It seems to me that when HER samples an achieved goal from the replay buffer it never samples the very last state of the episode. Is this intended?
As a consequence, the sampling strategy "final" …
-
Hi,
it seems like the state features that are added to the replay buffer in the HindsightExperienceReplayWrapper are features for the full observation-dictionary (observation, achieved-goal, desire…
-
Dear @rafaelpossas
Thanks for the code. I was trying your code for running an experiment but it seems pickle file is not loading properly. I am getting following error.
```
Traceback (most re…
-
In ddpg.py, the parameter `nb_rollout_steps` is an integer containing the number of rollout steps. I believe that this is the same as the parameter `T` in [OpenAI baselines](https://github.com/openai/…
-
TensorFlow takes minutes to import on a Raspberry Pi Zero W and that's probably because of the huge .so file with native primitives it has to load, among other things. Given the nature of the project,…
-
Hello, I am pretty new to MPI.
I am using stable-baselines DDPG for a custom environment. Everything is working fine and I am getting good results as well.
Question:
When I use MPI and run the co…
-
## Bug Description
I am running into an issue with the traffic_light_grid examples from both the stable_baselines and rllib set of examples.
For the stable baselines example, the script runs…
-
Hello, thank you for your code sharing.
https://github.com/openai/baselines/pull/1027.At here, You said you don't get much success for intergration ppo2 into gail.Can you tell me what kind of "not su…
-
- [x] I have marked all applicable categories:
+ [x] exception-raising bug
+ [x] RL algorithm bug
+ [ ] documentation request (i.e. "X is missing from the documentation.")
+ [ ] ne…
-
### 🚀 Feature
Build the STAC algorithm as a callable algorithm: https://arxiv.org/pdf/2002.12928.pdf
### Motivation
Hyperparametrization is one of the most time/cost expensive thing when training R…