armenkasp commented 8 months ago

[x] Added DDPG
[x] Reworked TD3
[x] Registered buffers w/ command line arguments or config file control
[x] Prioritized experience replay buffer added
[x] Uniform random experience replay buffer added
[x] Registered basic actor/critic models
[x] Added inference command in driver
[x] Added automatic model checkpoint via inference performance during training
[x] Configuration files saved in results for reproducibility
[x] Added support for custom PACEs environments

armenkasp commented 8 months ago

Unit Tests:

Unit test added to make sure registered buffers present:

python utest_registry.py

armenkasp commented 8 months ago

Loading models works by providing the path to the model directory in the cfg file of the desired agent.

Using config as follows:

"load_model": "/PATH/TO/MODELS/",

The agent will then look in your path to models and search for the corresponding model files to load for each.

Ex. TD3 has 6 models

actor_model
target_actor
critic_model1
target_critic1
critic_model2
target_critic2

This code searches for models in the directory that have the corresponding names (can have other appended information at the end actor_model_epoch50_40.h5 will work as well as actor_model.h5) and will make sure it is in the .h5 format.

DDPG has 4 models

actor_model
target_actor
critic_model1
target_critic1

armenkasp commented 8 months ago

New agent DDPG can be selected with --agent KerasDDPG-v0 command line argument

Results from Testing:

HalfCheetah:

Walker2D Comparison (Where TD3 should outperform DDPG):

armenkasp commented 8 months ago

Registed Implementations:

Experience Replay Buffer (ER-v0) Prioritized Experience Replay Buffer (PER-v0)

Can switch implementations in the cfg keras-td3 (ER-v0/PER-v0):

"buffer_type": "ER-v0", "buffer_type": "PER-v0",

Defaulted to experience replay buffer

PER and ER cfg files:

Buffer size now controlled in buffer config files

PER has two versions which can be changed in the associated cfg files (proportional/rank)

"prioritization_type": "proportional", "prioritization_type": "rank",

Proportional is defaulted and from the paper has alpha: 0.6 and beta: 0.4 Rank's default values from the paper has alpha: 0.7 and beta: 0.5

Command Line Additions:

These will override what is present in the cfg files --btype
--bsize

btype can be (ER-v0/PER-v0): bsize can be (proportional/rank)

Example script with new command line overrides: python -O drivers/run_continuous.py --agent KerasTD3-v0 --env HalfCheetah-v4 --btype PER-v0 --bsize 1000000 --nepisodes 1000

Base script still works just pulls from the cfg files: python -O drivers/run_continuous.py --agent KerasTD3-v0 --env HalfCheetah-v4 --nepisodes 1000

Tests:

Experience replay buffer shown to mimic results of the non registered uniform sampling implementation. In HalfCheetah-v4 no gain seen from using PER over ER, future issue to validate PER implementation with Atari games against ER.

armenkasp commented 8 months ago

Configuration is automatically saved in the results folder via a function call called cfg_save().

This function is abstracted and present in the base class meaning that all new modules created either agent/buffer/model will need to have a save_cfg() function in them.

The save_cfg() function copies the configuration of the agent/buffer/model at the beginning of the run in the logdir location in a folder called cfgs/

armenkasp commented 8 months ago

The driver now checks to see if there was a 5% increase in the inference reward from the previous best, if so, it creates a folder in the model directory with the epoch number and the percentage increase and saves the models in the folder with the same postfix. This was done to decrease the clutter (TD3 has 6 models to save) in the model folder and make it easily accessible.

The driver also saves the initial models weights in the same directory structure:

Unit tests passing can be seen here:

The directory structure can be seen here with saved models and folders:

armenkasp commented 8 months ago

Using the inference is as simple as adding --inference True to your command line argument

This argument will tell the run command to forgo training and immediately run inference via your selected agent in the environment.

Any example run script would be:

python -O run_continuous.py --agent KerasTD3-v0 --env Pendulum-v1 --nepisodes 10 --inference True

This would run 10 episodes of inference on the Pendulum-v1 environment with the KerasTD3 agent.

To utilize saved models for inference or retraining, please do the following:

Add "load_model": "PATH/TO/MODEL_FILES/" to the appropriate agents config file.

Results of inference can be visualized via Tensorboard

armenkasp commented 8 months ago

Screenshot 2024-03-28 at 3 07 44 PM

Ran baselines successfully, still need to figure out NaN error with DDPG and the PER.

EDIT

All tests pass after fixing bug in the critic training function of DDPG (abs of td error is required)

armenkasp commented 8 months ago

Documentation added to reflect changes in the wiki and further outline the workflow and framework

JeffersonLab / SciOptControlToolkit

March 2024 Release #1

Unit Tests:

Registed Implementations:

PER and ER cfg files:

Command Line Additions:

Tests:

EDIT