-
$`python main.py --task evaluate --stochastic_policy True --expert_path gailtf/baselines/ppo1/stochastic.ppo.Hopper.0.00.pkl --load_model_path gailtf/baselines/ppo1/checkpoint/ppo.Hopper.0.00/ppo.Hopp…
-
Hi can you merge this in a efficient way to the code? Or can you give me some advice where should I load the pre-retrained weights to the policy network ?
Next Question - What do you think about …
-
Nice implementation! but I find some inconsistency in your discriminator.py
If you check the formula 17 of paper [General Adversarial Imitation Learning](http://papers.nips.cc/paper/6391-generative-…
-
Thanks again for this wonderful project!
1. I am curious whether you are planning to release imitation learning code. Right now I can see we are using something like behavior cloning strategy (CNN …
-
HI,
When you optimizing the discriminator to output probabilities(later take it as a reward) for each [state, action] tuple, you consider the whole batch.
By doing this don't we loos the sequentia…
-
Hi Yusuke San:
I really admire your coding skills.
I have reviewed another GAIL which is written by TRPO. After reading your GAIL codes, I find a common set about parameter 'stochastic'. In run_ppo …
-
Hi,
I'm not sure if this is a PyTorch specific issue but during training my GPU memory is increasing immensely. For my experiment I'm using a high number of samples to optimize my discriminator and…
-
[2017-10-18 19:40:13,749]
I run the code in PYTHON 3.6.2 and have install the mujoco and configure the mjkey successfully:
$Making new env: Ant-v1
Traceback (most recent call last):
File "run…