Closed armenkasp closed 7 months ago
Unit test added to make sure registered buffers present:
python utest_registry.py
Loading models works by providing the path to the model directory in the cfg file of the desired agent.
Using config as follows:
"load_model": "/PATH/TO/MODELS/",
The agent will then look in your path to models and search for the corresponding model files to load for each.
Ex. TD3 has 6 models
This code searches for models in the directory that have the corresponding names (can have other appended information at the end actor_model_epoch50_40.h5
will work as well as actor_model.h5
) and will make sure it is in the .h5 format.
DDPG has 4 models
New agent DDPG can be selected with --agent KerasDDPG-v0
command line argument
Results from Testing:
HalfCheetah:
Walker2D Comparison (Where TD3 should outperform DDPG):
Experience Replay Buffer (ER-v0) Prioritized Experience Replay Buffer (PER-v0)
Can switch implementations in the cfg keras-td3 (ER-v0/PER-v0):
"buffer_type": "ER-v0",
"buffer_type": "PER-v0",
Defaulted to experience replay buffer
Buffer size now controlled in buffer config files
PER has two versions which can be changed in the associated cfg files (proportional/rank)
"prioritization_type": "proportional",
"prioritization_type": "rank",
Proportional is defaulted and from the paper has alpha: 0.6 and beta: 0.4 Rank's default values from the paper has alpha: 0.7 and beta: 0.5
These will override what is present in the cfg files
--btype
--bsize
btype can be (ER-v0/PER-v0): bsize can be (proportional/rank)
Example script with new command line overrides:
python -O drivers/run_continuous.py --agent KerasTD3-v0 --env HalfCheetah-v4 --btype PER-v0 --bsize 1000000 --nepisodes 1000
Base script still works just pulls from the cfg files:
python -O drivers/run_continuous.py --agent KerasTD3-v0 --env HalfCheetah-v4 --nepisodes 1000
Experience replay buffer shown to mimic results of the non registered uniform sampling implementation. In HalfCheetah-v4 no gain seen from using PER over ER, future issue to validate PER implementation with Atari games against ER.
Configuration is automatically saved in the results folder via a function call called cfg_save().
This function is abstracted and present in the base class meaning that all new modules created either agent/buffer/model will need to have a save_cfg() function in them.
The save_cfg() function copies the configuration of the agent/buffer/model at the beginning of the run in the logdir location in a folder called cfgs/
The driver now checks to see if there was a 5% increase in the inference reward from the previous best, if so, it creates a folder in the model directory with the epoch number and the percentage increase and saves the models in the folder with the same postfix. This was done to decrease the clutter (TD3 has 6 models to save) in the model folder and make it easily accessible.
The driver also saves the initial models weights in the same directory structure:
Unit tests passing can be seen here:
The directory structure can be seen here with saved models and folders:
Using the inference is as simple as adding --inference True
to your command line argument
This argument will tell the run command to forgo training and immediately run inference via your selected agent in the environment.
Any example run script would be:
python -O run_continuous.py --agent KerasTD3-v0 --env Pendulum-v1 --nepisodes 10 --inference True
This would run 10 episodes of inference on the Pendulum-v1 environment with the KerasTD3 agent.
To utilize saved models for inference or retraining, please do the following:
Add "load_model": "PATH/TO/MODEL_FILES/"
to the appropriate agents config file.
Results of inference can be visualized via Tensorboard
Ran baselines successfully, still need to figure out NaN error with DDPG and the PER.
All tests pass after fixing bug in the critic training function of DDPG (abs of td error is required)
Documentation added to reflect changes in the wiki and further outline the workflow and framework