LucasAlegre / morl-baselines

Multi-Objective Reinforcement Learning algorithms implementations.
https://lucasalegre.github.io/morl-baselines
MIT License
271 stars 44 forks source link

Hyperparameter optimization #57

Closed lowlypalace closed 10 months ago

lowlypalace commented 1 year ago

13

Feature Description

This feature introduces a new script to perform a sweep of multi-objective reinforcement learning (MORL) algorithms and environments. The script runs a series of experiments, collects performance metrics, and logs the results to Weights & Biases (W&B).

The training is performed with multiple seeds in parallel, leveraging the ProcessPoolExecutor to run each agent with a different seed concurrently. By running the training on a series of seeds, the script accounts for the variability in the learning process and provides a more comprehensive evaluation of the algorithms' performance. The average hypervolume metric, obtained from the results of training on different seeds, is computed and logged to Weights & Biases.

Components Description

The main components of the feature are:

The script allows users to easily perform a sweep of MORL algorithms and environments, exploring different hyperparameters and logging the results to W&B for further analysis.

Usage

An example usage:

python experiments/hyperparameter_search/launch_sweep.py \
--algo envelope \
--env-id minecart-v0 \
--ref-point 0 0 -200 \
--sweep-count 100 \
--num-seeds 3 \
--train-hyperparams num_eval_weights_for_front:100 reset_num_timesteps:False eval_freq:10000 total_timesteps:10000

The configs with the ranges of hyperparameters for the sweep should be placed in configs directory with the corresponding algorithm name, such as envelope.yaml.

Other Changes

Additionally, the PR does a reorg of file structure and moves some of the functions that are used by both launch_experiment.py and launch_sweep.py into common/experiments.py.

experiments
├── hyperparameter_search
│   ├── launch_sweep.py
│   └── configs
│       ├── envelope.yaml
│       └── pgmorl.yaml
└── benchmark
    └── launch_experiment.py

The PR also replaced the writer.add_scalar() logger with wandb.log() in log_all_multi_policy_metrics() helper method.

ffelten commented 10 months ago

TODO:

ffelten commented 10 months ago

Closing in favor of #74