baturaysaglam / RIS-MISO-Deep-Reinforcement-Learning

Joint Transmit Beamforming and Phase Shifts Design with Deep Reinforcement Learning
MIT License
139 stars 39 forks source link

Variation in channel at each time step. #18

Open Zainmustafajajja opened 10 months ago

Zainmustafajajja commented 10 months ago

We are resetting the channel (H_1, H_2) in reset() part of the environment. resetting is done only at start of each episode. it means channel will change once in an episode and it will stay constant during all steps in an episode, as there is no command to change channel during each time step. As per my knowledge channel should change at each time step. Kindly let me know if i am getting it wrong somewhere.

shahirivahid commented 7 months ago

I think what is implemented in code is right. Let me put it in this way. Imagine we are changing the channel state at each time step. At the beginning of the next episode we'll do the same. The question here is what's the point of defining various episodes in our training while we are updating the channel state per step?! Instead, we should use a single episode of a very large number of steps! Moreover, I think we need to give the agent a chance to explore the possible actions at the training phase for a given channel state to find a way to the optimal solution.

baturaysaglam commented 7 months ago

hi, I do agree with you. some aspects of the experimentation don't make any sense at all. but I would like to remind you that this is an implementation of an article that is not authored by me. the repo was just for a grad-level course project, and I tried to implement the paper as precise as possible, adhering to what's reported.

for a way of experimentation that makes sense, I suggest you refer to my own paper's repo.

shahirivahid commented 6 months ago

Thank you for your response dear Baturay. I also looked into your amazing ICC paper. I learnt a lot from your implementation of the JSAC paper too. I encountered some questions reading these two papers and I will greatly appreciate if you could share your expert opinion about them:

  1. It seems that the JSAC paper has adopted the episodic approach while in your ICC paper you have considered continuous approach. Does this mean that both of these approaches can lead to the favorable results?
  2. There are several curves in the JSAC paper that show the average reward versus number of steps. My intuition is that as we are moving forward in episodes, the performance of our DRL network improves and these curves should be different for different episodes. Do you think that this is right and if it is to which episode do the curves in the paper relate?
  3. I could observe that in both JSAC and ICC papers the transmit and receive powers are considered as a part of “state” fed into the actor and critic networks. As we have already considered channel states and previous actions in the “state”, why do we need to do this?
  4. I also observed that in both papers variable transmit power is not among the entries of “states”. Of course the transmit power is considered just like I mentioned in the 3rd question however I think that P_t is not considered as something random like the channel states. So the network is trained based on a specific P_t. How can it find the optimal solutions when we change P_t in the testing state?
baturaysaglam commented 6 months ago
  1. there is no approach that leads to favorable results. the approach you use (either continuous or episodic) is determined by the nature of the task of interest. if there are failure conditions (such as the fall of a robot or time limit), you have to use an episodic approach because you fail at some point. if there are none, you should stick to a continuous approach. an episodic version of a continuous task could be made, but it won't change much in terms of algorithm's functionality.
  2. the agent gets consistently improved. so, over a horizon of time steps, we would expect an agent to be improved. if you use an episodic approach, then the agent is naturally better in later episodes since it improves itself.
  3. I think I don't get this question. everything counts once as far as I remember. still, I'm the author of the ICC paper only. if there's a point that you don't understand about the JSAC paper, I guess you need to contact the authors.
  4. you only test the $P_t$ that you train the agent with. otherwise it wouldn't be a reasonable assessment.

hope this helps.

cfsj6 commented 5 months ago

Hello, I would like to ask, according to the JSAC method where a fixed channel is used for each episode to train the neural network, how do we use the trained neural network in an actual communication environment where the channel varies at each time step after the entire neural network training is completed?