baturaysaglam / RIS-MISO-PDA-Deep-Reinforcement-Learning

Joint Transmit Beamforming and Phase Shifts Design with Deep Reinforcement Learning Under the Phase-Dependent Amplitude Model
MIT License
62 stars 15 forks source link
5g deep-reinforcement-learning reconfigurable-intelligent-surfaces

Joint Transmit Beamforming and Phase Shifts Design with Deep Reinforcement Learning Under the Phase-Dependent Amplitude Model

Proceedings are out!

If you use our code/results, please cite the paper.

@INPROCEEDINGS{10283517,
  author={Saglam, Baturay and Gurgunoglu, Doga and Kozat, Suleyman S.},
  booktitle={2023 IEEE International Conference on Communications Workshops (ICC Workshops)}, 
  title={Deep Reinforcement Learning Based Joint Downlink Beamforming and RIS Configuration in RIS-Aided MU-MISO Systems Under Hardware Impairments and Imperfect CSI}, 
  year={2023},
  volume={},
  number={},
  pages={66-72},
  doi={10.1109/ICCWorkshops57953.2023.10283517}
}

PyTorch implementation of the paper, Deep Reinforcement Learning Based Joint Downlink Beamforming and RIS Configuration in RIS-aided MU-MISO Systems Under Hardware Impairments and Imperfect CSI. The paper has been accepted to 2023 IEEE International Conference on Communications the 5th Workshop on Data Driven Intelligence for Networks and Systems (DDINS).

For the first time in the literature, we solve a Reconfigurable Intelligent Surface (RIS) assisted multi-user multi-input single-output (MISO) System problem under hardware impairments through a machine learning approach. Specifically, the deep reinforcement learning algorithm of SAC combined with DISCOVER used to tackle the issues induced by the phase-dependent amplitude model (PDA) in RIS-aided systems.

The algorithm is tested, and the results are produced on a custom RIS-assisted multi-user MISO environment. Learning curves for the results presented in the paper are found under ./Learning Curves. Each learning curve is formatted as NumPy arrays of 20000 instant rewards (20000,). Corresponding learning figures are found under ./Learning Figures. The learning curves depict instant rewards achieved by the agents for 20000 training steps, averaged over ten random seeds.

Pseudocode

The Hyper-Parameter Setting

Hyper-Parameter Value
# of hidden layers (all networks) $2$
# of units in each hidden layer (all networks) $256$
Hidden layers activation (all networks) ReLU
Final layer activation (Q-networks) Linear
Final layer activation (actor, explorer) tanh
Learning rate $\eta$ (all networks) $10^{-3}$
Weight decay (all networks) None
Weight initialization (all networks) Xavier uniform
Bias initialization (all networks) constant
Optimizer (all networks) Adam
Total time steps per training $20000$
Experience replay buffer size $20000$
Experience replay sampling method uniform
Mini-batch size $16$
Discount term $\gamma$ $1$
Learning rate for target networks $\tau$ (all networks) $10^{-3}$
Network update interval (all networks) after each environment step
Initial $\alpha$ $0.2$
Entropy target $\texttt{-action dimension}$
SAC log standard deviation clipping $(-20, 2)$
SAC $\epsilon$ $10^{-6}$
$\beta$-Space Exploration $\lambda$ at time step $t$ $0.3 - \frac{0.3 \times t}{\texttt{total time steps}} $
$\mu$ (environment-related) 0
$\kappa$ (environment-related) 1.5
Channel noise variance $\sigma_{e}^{2}$ (environment-related) $10^{-2}$
AWGN channel variance $\sigma_{w}^{2}$ (environment-related) $10^{-2}$
Channel matrix initialization (Rayleigh) (environment-related) $\mathcal{CN}(0, 1)$

Computing Infrastructure

The hardware/software model/version alters the DRL agents' training stochasticity due to the use of random seeds. Therefore, it complicates the precise reproduction of the reported results. The following computing infrastructure is used to produce the results.

Hardware/Software Model/Version
Operating System Ubuntu 18.04.5 LTS
CPU AMD Ryzen 7 3700X 8-Core Processor
GPU Nvidia GeForce RTX 2070 SUPER
CUDA 11.1
Python 3.8.5
PyTorch 1.8.1
OpenAI Gym 0.17.3
MuJoCo 1.50
Box2D 2.3.10
NumPy 1.19.4

Run

0. Requirements

gym==0.17.3
numpy==1.23.3
torch==1.12.1

1. Installing

2. Register the custom RIS-assisted multi-user MISO environment to OpenAI Gym

You need to use the environment.py file to register the environment to OpenAI Gym. A tutorial on how to register an environment can be found here.

3. Train the model from scratch