amathislab / lattice

[NeurIPS 2023] Latent Exploration for Reinforcement Learning
MIT License
32 stars 1 forks source link
exploration inductive-bias reinforcement-learning sensorimotor-learning

Lattice (Latent Exploration for Reinforcement Learning)

This repository includes the implementation of Lattice exploration from the paper Latent Exploration for Reinforcement Learning, published at NeurIPS 2023.

Lattice introduces random perturbations in the latent state of the policy network, which result in correlated noise across the system's actuators. This form of latent noise can facilitate exploration when controlling high-dimensional systems, especially with redundant actuation, and may find low-effort solutions. A short video explaining the project can be found on YouTube.

Lattice builds on top of Stable Baselines 3 (version 1.6.1) and it is here implemented for Recurrent PPO and SAC. Integration with a more recent version of Stable Baselines 3 and compatibility with more algorithms is currently under development.

This project was developed by Alberto Silvio Chiappa, Alessandro Marin Vargas, Ann Zixiang Huang and Alexander Mathis (EPFL).

MyoChallenge 2023

We used Lattice train the top submission of the NeurIPS 2023 competition MyoChallenge, object manipulation track. With curriculum learning, reward shaping and Lattice exploration we trained a policy to control a biologically-realistic arm with 63 muscles and 27 degrees of freedom to place random objects inside a box of variable shape:

relocate

We outperformed the other best solutions both in score and effort:

drawing

We have also created a dedicated repository for the solution, where we have released the pretrained weights of all the curriculum steps.

Installation

We recommend using a Docker container to execute the code of this repository. We provide both the docker image albertochiappa/myo-cuda-pybullet in DockerHub and the Dockerfile in the docker folder to create the same docker image locally.

If you prefer to manually create a Conda environment, you can do so with the commands:

conda create --name lattice python=3.8.10
conda activate lattice
pip install -r docker/requirements.txt
pip install myosuite==1.2.4 
pip install --upgrade cloudpickle==2.2.0 pickle5==0.0.11 pybullet==3.2.5

Please note that there is a version error with some packages, e.g. stable_baselines3, requiring a later version of gym which myosuite is incompatible with. For this reason we could not include all the requirements in docker/requirements.txt. In our experiments the stated incompatibility did not cause any error.

Training a policy with Lattice

We provide scripts for various environments of the MyoSuite and PyBullet.

Training a policy is as easy as

python main_pose_elbow.py --use_lattice

if you have created a conda environment.

If you prefer to use the readily available docker container, you can train like this:

docker run --rm --gpus all -it \ 
--mount type=bind,src="https://github.com/amathislab/lattice/raw/main/$(pwd)/src",target=/src \
--mount type=bind,src="https://github.com/amathislab/lattice/raw/main/$(pwd)/data",target=/data \ 
--mount type=bind,src="https://github.com/amathislab/lattice/raw/main/$(pwd)/output",target=/output \ 
albertochiappa/myo-cuda-pybullet \
python3 src/main_pose_elbow.py --use_lattice

The previous command will start training in the Elbow Pose enviornment using Recurrent PPO. Simply change the main script name to start training for a different environment. The output of the training, including the configuration used to select the hyperparameters and the tensorboard logs, are saved in a subfolder of output/, named as the current date. The code outputs useful information to monitor the training in Tensorboard format. You can run Tensorboard in the output folder to visualize the learning curves and much more. The different configuration hyperparameters can be set from the command line, e.g., by running

python src/main_humanoid.py --use_sde --use_lattice --freq=8

In this case, a policy will be trained with SAC in the Humanoid environment, using state-dependent Lattice with update period 8.

Structure of the repository

Reference

If our work was useful to your research, please cite:

@article{chiappa2023latent,
  title={Latent exploration for reinforcement learning},
  author={Chiappa, Alberto Silvio and Vargas, Alessandro Marin and Huang, Ann Zixiang and Mathis, Alexander},
  journal={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2023}
}