Closed FO-E closed 1 year ago
I store the instantaneous rewards the agent achieves corresponding to each time step. then, I used the matplotlib package to generate the figures as I liked. feel free to let me know if there's anything else.
Okay, thanks for your response.
For Figure 4, after you set up M = 8, K=8, N = 8 (red plot), then simulated several transmit powers. For each transmit power, you simulated several episodes and timesteps and recorded the instantaneous reward (whose size will be the number of episodes by the number of timesteps). In Figure 4, for example, Pt = 10dB, how did you compute the sum rate? Did you use the maximum instantaneous rewards achieved after the training or the maximum reward after computing the average reward?
I used the instantaneous rewards achieved corresponding to each time step, that is, each point of a curve corresponds to the reward achieved for a particular step.
I am sorry to bother you. This is my first time working on an RL project. I want to know what procedure to follow in generating figures 4 and 5, using your code and training from scratch.