-
Thanks your great work, I was reading your amazing blog recently. Maybe a stupid issue, I don't really understand the `logits` means in your code. I only know that it is the raw output of the last `De…
-
I would like to report an issue which has a defining impact on the critic based
algorithms for the MPE PredatorPrey Task. The `target_value` variable, built
for training the critic network, is about…
-
### 🐛 Bug
Hello, I was packaging stable_baselines3 for NixOS because there is currently no package for it. After successfully fetching the source, I encountered the error below when importing it. It …
-
I was surprised to see this loss function because it is generally used when the target is a distribution (i.e. sums to 1). This is not the case for the advantage estimate. However, I worked out the ma…
-
Należy wziąć rlliba/stable baselines/inna biblioteka z algorytmami i wrzucić do repo kod, który uruchami jakieś PPO albo A2C na środowisku CartPole z gyma. Kolejny krok to sprawdzenie jak w tej biblio…
-
Hi everybody,
first many thanks for putting this repo online. As I found it I thought something like: "High quality implementations of RL algorithms? That is pretty cool." However, after having a …
-
Is there a particular reason why `VecNormalize` is only applied to 1-D observations? If yes, wouldn't it make sense to apply at least the rewards normalization?
https://github.com/ikostrikov/pytorch-…
-
Is there any reason that DDPG doesn't have a load_path parameter like A2C that allows restoring trained weights? I'm adding it in my own copy of the code but was wondering if there's some known proble…
-
Eg during running `python3 main.py --type A2C --env CartPole-v1`:
- many libs are missing (opencv, pandas, tensorflow); I installed some versions, but they seem incompatible, also Python 3.7 seems in…
-
## Objective
COM based line following controller
## Algorithm
A2C or PPO
## Reward
stay in the center of the line with high velocity.
## observation
- center line.
- imu ?
## action:
- $v_x$ and $…