main.m
to perform a test-run to ensure code is working. It runs 4 code files sequentially. It will train an agent with just 100 episodes, store it in \results
folder, validate it against the PID, perform stability analysis (on an existing transfer-function data file, stored in \data
folder) and produce plots and store them in \results
folder.code_DDPG_Training.m
: Training code that uses DDPG to train an agent in a staged manner. Uses sm_DDPG_Training_Circuit.slx. This file is run iteratively, using Graded Learning to run on the previously stored model and enhancing it's "learning". sm_DDPG_Training_Circuit.slx
: Simlulink model to train the agent to control a non-linear valve modelsm_Experimental_Setup.slx
: Simulink model to compare the DDPG agent controller with PID, and experiment with various noise signals and noise sourcescode_Experimental_Setup.m
: Load a pre-trained model (RL controller) and run to see effect. Uses sm_Experimental_Setup.slx code_SA_TF_Estimator.m
: Estimate a transfer function for the RL controller to perform stability analysissm_StabilityStudy.slx
: Simulink model used to estimate the transfer functionThe paper https://doi.org/10.1016/j.mlwa.2021.100030 explores RL for optimum control of non-linear systems.
We use the DDPG (Deep Deterministic Policy-Gradient) algorithm to control a non-linear valve modelled based on di Capaci and Scali (2018). While the code and paper use valves as a 'plant', the method and code is easily adaptable to any industrial plant.
Challenges associated with Reinforcement Learning (RL) are outlined in the paper. The paper explores "Graded Learning" to assist in efficiently training an RL agent. We decompose the training task into simpler objectives and train the agent in stages. The Graded Learning parameters will be based on your process and plant.
Note that Graded Learning is the simplest (and application/practise oriented) form of Curriculum Learning (Narvekar et al., 2020).
The paper and code uses the following elements as the controlled system:
G(s) = k * exp(-L.s) / (1 + T.s)
where k = 3.8163, T = 156.46 and L is the time-delay parameter and L = 2.5
Static friction or stiction: fS = 8.40
Dynamic friction: fD = 3.5243
To train the agent, launch the Simulink model sm_DDPG_Training_Circuit.slx
and then ensure variables are correctly set in the code file code_DDPG_Training.m
and excute the code.
Review/set the following global and "Graded Learning" variables:
MODELS_PATH
: Points to your base path for storing the models. Leave it to 'models' and the code will create a folder if it does not exist.VERSION
: Version suffix for your model, say "V1", or "Grade-1" etc. Ensure you change this so that a new model is created during each stage of the training process. VALVE_SIMULATION_MODEL
: Set to the Simulink model 'sm_DDPG_Training_Circuit'. In case you rename it you will have to set the name here.USE_PRE_TRAINED_MODEL = false
: To train the first model - or to train only a SINGLE model set to 'false'
To train a pre-trained model, i.e. apply Graded Learning set USE_PRE_TRAINED_MODEL = true;PRE_TRAINED_MODEL_FILE = 'Grade_I.mat'
: Set to file name of previous stage model. Example shown here is set to Grade_I model, to continue training an agent and create a Grade_II model. MAX_EPISODES = 1000
: This is the maximum episodes a training round lasts. Reduce this initally if you want to test it. However training a stable agent requires 1000 of episodesNext set the Graded Learning parameters:
Graded Learning: We trained the agent in SIX stages (Grade-I to Grade-VI) by successively increasing the difficulty of the task. The parameters will be based on your process and plant. For this code, we used the following:
TIME_DELAY
= Time-delay parameter (L) of the FOPTD process. Set as 0.1, 0.5, 1.5, 2.0 and 2.5fS
= Non-linear valve stiction. We use the following stages 1/10th of 8.5, followed by 1/5th, 1/2, 2/3 and finally full 8.4fD
= Non-linear valve dynamic friction. We used the same fractions as above for fS for fD, finally ending with the actual value of 3.5243Suggested Graded Learning stages:
- GRADE_I: TIME_DELAY=0.1; fS = 8.4/10; fD = 3.5243/10
- GRADE_II: TIME_DELAY=0.5; fS = 8.4/5; fD = 3.5243/5
- GRADE_III: TIME_DELAY=1.5; fS = 8.4/2; fD = 3.5243/2
- GRADE_IV: TIME_DELAY=1.5; fS = 8.4/1.5; fD = 3.5243/1.5
- GRADE_V: TIME_DELAY=2.0, fS = 8.4//1.5; fD = 3.5243/1.5
- GRADE_VI: TIME_DELAY=2.5, fS = 8.4//1.0; fD = 3.5243/1.0
To experiment with a trained RL controller/agent, launch the Simulink model sm_Experimental_Setup.slx
and then ensure variables are correctly set in the code file code_Experimental_Setup.m
and excute the code.
Variables to be set:
MODELS_PATH
: Points to your base path for storing the models. Default 'models/'VALVE_SIMULATION_MODEL = sm_Experimental_Setup
: Points to Simulink model used for validation against PID and experimenting with different noise sources etc.PRE_TRAINED_MODEL_FILE = 'Grade_V.mat'
: Pre-trained model (RL controller) to be tested or validated. Example shows a model called Grade_V.mat
TIME_DELAY
, fS
(stiction) and fD
(dynamic friction): Variables that represent the physical parametersStability Analysis of the RL controller. Note that the "System Identification Toolbox" must be installed to estimate transfer-functions
Steps:
sm_DDPG_Training_Circuit.slx
: Simulink: Non-linear valve model, DDPG agent controllercode_DDPG_Training.m
: MATLAB code: Create a DDPG agent and train using Graded Learning sm_Experimental_Setup.slx
: Simulink: Non-linear valve model, DDPG agent controller. Experiment with noise signals and sourcescode_Experimental_Setup.m
: MATLAB code: Load trained models of choice and run to see effects on scopesm_PID_Tuning.slx
: Simulink: Tune a PID controller for comparisoncode_SA_TF_Estimator.m
: MATLAB code: Estimate a Transfer function for the RL controllercode_SA_Utilities.m
: A small utilities file for plotting etc.sm_StabilityStudy.slx
: Simulink: Model to estimate the transfer function for the RL controllerdata_SA_TransferFunctions.mat
: Data file: Store the transfer function data_TransferFunctions_NP3_NZ1.mat
: Data file: Store the transfer function (example transfer function with 3 poles and 1 zero)Note that the "System Identification Toolbox" must be installed to estimate transfer-functions
Please cite as:
@article{SIRASKAR2021100030,
title = {Reinforcement learning for control of valves},
journal = {Machine Learning with Applications},
author = {Rajesh Siraskar},
year = {2021},
issn = {2666-8270},
doi = {https://doi.org/10.1016/j.mlwa.2021.100030},
url = {https://www.sciencedirect.com/science/article/pii/S2666827021000116}
}
Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Taylor, M.E., Stone, P., 2020. Curriculum learning for reinforcement learning domains: A framework and survey. arXiv preprint arXiv:2003.04960
di Capaci, R.B., Scali, C., 2018. An augmented PID control structure to compensate for valve stiction. IFAC-PapersOnLine 51, 799–804.