-
### What is the problem?
I am using BOHB to optimize the hyperparameters for the DQN algorithm in order to solve the MountainCar-v0 problem.
I alway run into the following issue (even if I use…
-
To benchmark AIRL, I’m (planning on) comparing imitation performance of our new AIRL implementation on modern gym envs against the performance of the old AIRL implementation (on adamgleve/inverse_rl) …
-
**Describe the bug**
Hyperparameter optimization breaks the tensorboard logging. When it is active and multiple optimization jobs are running, all datapoints are logged to the last job's tensorboard.…
-
Currently in the MountainCar-v0 environment, the [timestep_limit is 200](https://github.com/openai/gym/blame/master/gym/envs/__init__.py#L70) which makes learning very difficult: most initial policies…
-
Hi, how an exciting work you have done in tianshou! But there still some doubts while I use the code in my experiments. I have found that there are no different in the continues version PPO and discre…
-
**Describe the bug**
When we use reward normalization is expected that evaluations are done with original reward values. And this is actually done for training (train.py: lines 291-298). But evalua…
-
The wiki of MountainCar v0 is saying that the episode ends when you reach 0.5 position, or if 200 iterations are reached. But I didn't find any condition check for the number of iteration in the code.…
-
Thanks for your developement it seems to be inspiring project!
Although when I tried to launch a command from Examples:
`python run.py --gym -a ppo -n train_using_gym --gym-env MountainCar-v0 --rend…
-
**Improving description for `argparse.ArgumentParser` in `stable_baselines\deepq\experiments\train_mountaincar.py`**
Line 34 in the file goes like
```python
parser = argparse.ArgumentParser(de…
-
Hello,
I have a problem when I tried to use DDPG + HER.
The problem seems the definition of observation_space.
```bash
Traceback (most recent call last):
File "/home/all-jy/git/jy_gym_stfl/tr…