baturaysaglam / RIS-MISO-Deep-Reinforcement-Learning

Joint Transmit Beamforming and Phase Shifts Design with Deep Reinforcement Learning
MIT License
130 stars 37 forks source link

What's the between sum_rate and reward? #9

Closed Meblus199478 closed 1 year ago

Meblus199478 commented 1 year ago

I find the reward is defined as sum rate capacity, but I am confused why the sum rate doesn't equal reward in figures.

baturaysaglam commented 1 year ago

can you be more specific on that?

Meblus199478 commented 1 year ago

In figure 4, the sum rate is from 5 to 35 image However, in figure 6, the reward is less than 10. In my opinion, our reward should be sum rate, so max reward should equal sum rate. Can you help me to solve that issue? image

baturaysaglam commented 1 year ago

yes, there's an inconsistency between the two figures. however, note that the used hyperparameters are different for these figures; otherwise, they would've produced the same results. the authors didn't provide any hyperparameter setting for such particular learning curves, and I don't remember which hyperparameter values I used to produce Fig. 6 unfortunately. I've taken a look at the paper, but still couldn't find any information.

please let me know if anything else, and if you find the used hyperparameter values for Fig. 6.

Meblus199478 commented 1 year ago

Thank you for your reply. However, there is something weird. In figure 4, when I increase the number of RIS element (N), the result is getting worse, not like what you see in your figure 4. The following is my current figure 4 based on your code. image

baturaysaglam commented 1 year ago

I believe this is expected since you increased the number of users as well. increasing the number of users would degrade the performance.

Meblus199478 commented 1 year ago

Thx for your explanation. I have changed the configuration, where the only distinction is the number of RIS element (N) like the following. By the way, I consider the sum rate in figure 4 may be the opt_reward rather than current_reward. opt_reward is the SNR rather than SINR. In that case, we will get larger sum rate.

image

baturaysaglam commented 1 year ago

yes, this is what's expected. when you increase the number of RIS elements, you'd obtain more transmission power as well as effective performance.

regarding the SNR/SINR, thank you for pointing this out, but I'm not the author of the paper, so I only tried to reproduce the figures more or less the same. authors didn't provide much detail such as which objective (as a reward) they used, hyperparameter settings, etc.