GiacomoPracucci / RL-edge-computing

Reinforcement learning for load distribution in a decentralized Edge environment. This is the implementation of my Master's thesis project for the Data science course (October 2023).
Apache License 2.0
17 stars 1 forks source link
deep-reinforcement-learning edge-computing neat ppo reinforcement-learning sac

Reinforcement Learning for load distribution in decentralized Edge environment

Description

The project proposes the implementation of SAC (Soft actor-critic) and PPO (Proximal Policy Optimization) deep reinforcement learning algorithms and of the evolutionary algorithm NEAT (Neuro Evolution of Augmenting Topologies) to optimize workload management in an Edge Computing system (DFaaS). The goal is to find the optimal policy for local processing, forwarding of requests to edge nodes, and rejection of requests based on system conditions. The current implementation still has simplifying assumptions compared to the real scenario.

In the simulated environment, the agent receives a sequence of incoming requests over time. At each step, it must decide how many of these requests to process locally, how many to forward to another edge node, and/or how many to reject. The number of incoming requests varies over time.

The action space is a three-dimensional continuous box where each dimension corresponds to the proportions of requests that are processed locally, forwarded, or rejected.

The observation space consists of four components:

The reward function in this environment depends on the actions taken by the agent and the system state. The reward function provides more points for processing requests locally and fewer points for forwarding requests. It penalizes the system heavily for rejecting requests and for causing congestion in the queue.

Training and test settings

Three different training scenarios were defined, distinguished by the different way of generating requests to be processed and the different way of updating the available forwarding capacity to other nodes.

The idea is to evaluate the results obtained according to different work contexts. Different scenarios allow us to assess the generalization capabilities of the algorithms by evaluating the performance obtained in work scenarios other than the training scenario (overfitting evaluation).

Best experiment results

The highest reward scores and best generalization abilities have been achieved by PPO with standard hyperparameters, trained in scenario 2.