Parallel Hyperparameter Search

PR Description

This PR makes hyperparameter search more scalable by running the training loop for each agent in parallel. This is especially useful when running the search on a cluster (e.g. Slurm).

Assign workers to devices such as GPU / CPU. The list of devices can now be provided as an argument.
Fix an issue when we were not setting number of threads for each process. This basically made the CPU parallelization even slower when a sequential run. Now this is fixed. Link to the issue.
Add num_workers parameter to define the size of the pool of processes. Still not sure about this one though as we still need to wait for all of the workers from the same iteration so that we can get the hypervolume. So maybe we can just spawn the same number of processes as num_seeds.
There was a bug in one of the initializers of the Envelope and PCN algorithms, where we were passing device as an id making it essentialy always default to the same device.

TODO:

[ ] True GPU parallelization of policy evaluation / model update step.
[x] Add examples of runs with different configurations

Example Configs on a Slurm Cluster

Using 4 GPUs + 4 workers

#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes

#SBATCH --cpus-per-task 4 # number of processes
#SBATCH -G 4

python experiments/hyperparameter_search/launch_sweep.py \
--algo envelope \
--env-id minecart-v0 \
--sweep-count 100 \
--seed 10 \
--num-seeds 4 \
--num-workers 4 \
--devices cuda:0 cuda:1 cuda:2 cuda:3

Using 4 CPUs + 4 workers

#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes

#SBATCH --cpus-per-task 4 # number of processes

python experiments/hyperparameter_search/launch_sweep.py \
--algo envelope \
--env-id minecart-v0 \
--sweep-count 100 \
--seed 10 \
--num-seeds 4 \
--num-workers 4

Each worker will use auto and then each algo instance will default to cpu as CUDA is not available.

Example Runs on a Slurm Cluster

Example Runs:	Workers	CPUs	GPUs	CPU Usage	GPU Usage
4	4	0	94.88%	N/A	18
4	1	0	95.55%	N/A	15
1	1	0	25.03%	N/A	15
4	4	1	18.78%	99%	5
4	4	4	31.07%	9% 11% 12% 10%	5
4	1	4	98.85%	4% 5% 5% 5%	13

Workers corresponds to num_workers, CPUs corresponds to --cpus-per-task, GPUs corresponds to -G
Number of seeds set to 4 (i.e. training 4 agents).
GPU Usage measured through srun -s --jobid <job-id> --pty nvidia-smi command while running the job.
CPU Usage measured via seffcommand after finishing the job.
CPU Config: Intel Broadwell or Skylake processors.
GPU Config: Tesla V100.
Each run lasted 10 hours.

LucasAlegre / morl-baselines

Parallel Hyperparameter Search #84

PR Description

TODO:

Example Configs on a Slurm Cluster

Example Runs on a Slurm Cluster