On the problem of reward function

Dear author: First of all, thank you for your project which has brought me a lot of inspiration. At the same time, in the part of reward function, I have a problem. I tried to print the energy, energy penalty, time and time penalty, and I found that the energy data and time data were several orders of magnitude different. Time is ≈10^-1 and energy is ≤10^-3. Both of their weight are 0.5 and 0.5.(step10). I want to know the rationality of designing rewards in this way. I wonder if it is necessary to normalize or standardize the time delay and energy before designing the reward function. Beside, regarding the weight of reward function, whether it is 0.5 and 0.5 or 1 and 5, are you experimenting constantly? Is there a better way to set their weights scientifically? These are some of my questions, and I would appreciate it if you could give me some guidance.

Thank you for the comment. And you have understood the impact well. Yes you are right choosing fair balance is a challenge. I have conducted the experiment with three scenarios as presented. I have explained the impact of having equal 0.5 weight in the readme script, and that the other experiment should be used for fair balance. I have also explained it in detail in my thesis, which the link is in the readme script.

The scientific approach to deal with balancing the weights is to design a multi objective RL that does not need choosing weights but learning a vector of rewards for the two measurements independently. A reference to such paper is discussed in the thesis and the arXiv version of the paper but the implementation is beyond the objective of our research.

I hope this helps. Feel free to ask again. I will respond.

On Sat, 10 Aug 2024, 05:00 sixteenyao, @.***> wrote:

Dear author: First of all, thank you for your project which has brought me a lot of inspiration. At the same time, in the part of reward function, I have a problem. I tried to print the energy, energy penalty, time and time penalty, and I found that the energy data and time data were several orders of magnitude different. Time is ≈10^-1 and energy is ≤10^-3. Both of their weight are 0.5 and 0.5.(step10). I want to know the rationality of designing rewards in this way. I wonder if it is necessary to normalize or standardize the time delay and energy before designing the reward function. Beside, regarding the weight of reward function, whether it is 0.5 and 0.5 or 1 and 5, are you experimenting constantly? Is there a better way to set their weights scientifically? These are some of my questions, and I would appreciate it if you could give me some guidance.

— Reply to this email directly, view it on GitHub https://github.com/TesfayZ/CCM_MADRL_MEC/issues/8, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANZZ275ADAHRDB42DKRYAEDZQWF6TAVCNFSM6AAAAABMJPGAFGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ2TQOBYGI4DGNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

TesfayZ / CCM_MADRL_MEC

On the problem of reward function #8