DeepWok / mase

Machine-Learning Accelerator System Exploration Tools
Other
105 stars 52 forks source link

Group 11: Reinforcement Learning for MASE Search action #77

Closed Yanzhou-Jin closed 3 months ago

Yanzhou-Jin commented 3 months ago

Introduction to the program

Our program implements mixed-precision search based on reinforcement learning. It offers three different operation modes, which are introduced as follows:

In order to make parameter adjustment simpler, parameters configuration is done using a configuration file following the example of optuna search. The vgg7_rl_search.toml file was created to store the reinforcement learning search parameters for the VGG7 model. These parameters mainly include the parameter for quantization and other parameters used in reinforcement learning. The names and value ranges of the main parameters are shown in the table below:

Name Value Name Value
X_width Integer\_array X_frac_width Integer\_array
algorithm 'a2c', 'ppo' load_path String
device 'cpu', 'cuda' save_name String
env mixed\_precision mode 'load', 'train', 'continue-training'
total_timesteps Integer mode (Cont.) 'continue-training'

In the table, X_width and X_frac_width denote quantization parameters such as bias_width and bias_frac_width, which are represented as multi-dimensional integer arrays. algorithm is the policy used in reinforcement learning, load_path is used to specify the file path of any stored reinforcement learning model, usually used in load or continue training mode. device is the processor that runs the reinforcement learning algorithm, save_name is the name where the trained model is stored, mode is the mode in which the program runs, train represents train from sketch, and load represents reading. load_path model, continue-training means reading the load_path model and continuing training. total_timesteps is the total time steps in the training process.

Program Execution

By running the following code, the program can be executed in the terminal. The execution mode can be switched by modifying the configuration file located at mase/machop/configs/examples/vgg7_rl_search.toml. ./ch search --config /path/to/toml-file --load /path/to/check-point

Main Modification to the Code

The modification gathered mainly in three files, which are core_algorithm.py, env.py, and quantize.py. The first two are responsible for executing the core algorithm of reinforcement learning and defining the environment, respectively. The reason for modifying quantize.py is due to a specific problem: every time a quantization operation is performed, the program does not follow the settings specified by the configuration file, causing the average bitwidth to remain unchanged. To solve this problem, the graph_iterator_quantize_by_type2 function was copied and inserted into quantize.py, and one line of code was modified, as shown below:

# Original code
node_config = get_config(config, get_mase_op(node))

# Modified code
node_config = get_config(config, node.name)

Through the above modifications, the integration of reinforcement learning search action in the MASE system is achieved. Users can now execute commands via the terminal and correctly configure the *.toml configuration file to perform parameter adjustment for training and reading of the reinforcement learning model.