Closed Glen9010 closed 2 years ago
if the agent is deployed in the real world and the channels is fast time-varying stochastic process, does it still need train for a long time maybe one or more episode ? Or this agent is designed for the static channels?
to answer the question regarding channel representation, I suggest you read the paper. I'm not the author of the paper. but all I can say is if RL is used for learning, then the environment (channel) is assumed to be represented by a Markov Decision Process, which is a discrete-time stochastic control process.
I have downloaded this projects and run it. I found that in the first eps , the results can reach 9 or higher. But in the later eps, the result was stuck in about 1 to 4, this question puzzled me for a long time.