Feature/hpo - Githubissues

Recreating from #57

Solves #13

Feature Description

This feature introduces a new script to perform a sweep of multi-objective reinforcement learning (MORL) algorithms and environments. The script runs a series of experiments, collects performance metrics, and logs the results to Weights & Biases (W&B).

The training is performed with multiple seeds in parallel, leveraging the ProcessPoolExecutor to run each agent with a different seed concurrently. By running the training on a series of seeds, the script accounts for the variability in the learning process and provides a more comprehensive evaluation of the algorithms' performance. The average hypervolume metric, obtained from the results of training on different seeds, is computed and logged to Weights & Biases.

Components Description

The main components of the feature are:

Argument parsing: Parse command-line arguments for the algorithm, environment ID, reference point, W&B entity, project name, number of seeds, and training hyperparameters.
Worker classes: Define classes to handle worker setup and results, including WorkerInitData and WorkerDoneData.
Train function: Implement a train function to instantiate the selected algorithm, train the agent, and return the hypervolume metric.
Main function: Initialize W&B, create a process pool of workers, submit tasks to the workers, collect results, compute the average hypervolume, and log the metrics to W&B.
Sweep setup and execution: Load the sweep configuration, set up the sweep with W&B, and run the sweep agent using the main function.

The script allows users to easily perform a sweep of MORL algorithms and environments, exploring different hyperparameters and logging the results to W&B for further analysis.

Usage

An example usage:

python experiments/hyperparameter_search/launch_sweep.py \
--algo envelope \
--env-id minecart-v0 \
--ref-point 0 0 -200 \
--sweep-count 100 \
--num-seeds 3 \
--train-hyperparams num_eval_weights_for_front:100 reset_num_timesteps:False eval_freq:10000 total_timesteps:10000

The configs with the ranges of hyperparameters for the sweep should be placed in configs directory with the corresponding algorithm name, such as envelope.yaml.

Other Changes

Additionally, the PR does a reorg of file structure and moves some of the functions that are used by both launch_experiment.py and launch_sweep.py into common/experiments.py.

experiments
├── hyperparameter_search
│   ├── launch_sweep.py
│   └── configs
│       ├── envelope.yaml
│       └── pgmorl.yaml
└── benchmark
    └── launch_experiment.py

LucasAlegre / morl-baselines

Feature/hpo #74

Feature Description

Components Description

Usage

Other Changes