PyTorch implementation of reinforcement learning algorithms

This repository contains:

Important notes

The code now works for PyTorch 0.4. For PyTorch 0.3, please check out the 0.3 branch.
To run mujoco environments, first install mujoco-py and gym.
If you have a GPU, I recommend setting the OMP_NUM_THREADS to 1 (PyTorch will create additional threads when performing computations which can damage the performance of multiprocessing. This problem is most serious with Linux, where multiprocessing can be even slower than a single thread):
```
export OMP_NUM_THREADS=1
```

Support discrete and continous action space.
Support multiprocessing for agent to collect samples in multiple environments simultaneously. (x8 faster than single thread)
Fast Fisher vector product calculation. For this part, Ankur kindly wrote a blog explaining the implementation details.
Policy gradient methods
Trust Region Policy Optimization (TRPO) -> examples/trpo_gym.py
Proximal Policy Optimization (PPO) -> examples/ppo_gym.py
Synchronous A3C (A2C) -> examples/a2c_gym.py

python gail/save_expert_traj.py --model-path assets/learned_models/Hopper-v2_ppo.p
To do imitation learning
python gail/gail_gym.py --env-name Hopper-v2 --expert-traj-path assets/expert_traj/Hopper-v2_expert_traj.p