baturaysaglam / RIS-MISO-Deep-Reinforcement-Learning

Joint Transmit Beamforming and Phase Shifts Design with Deep Reinforcement Learning
MIT License
130 stars 37 forks source link

Some doubts about program run time? #1

Closed TiantianZhang closed 1 year ago

TiantianZhang commented 1 year ago

How long does it take to run this program with the default parameters? --num_time_steps_per_eps = 20000 --num_eps default=5000

I run this main.py on a rtx3090 GPU, one eps needs 4min. Is this allright?

baturaysaglam commented 1 year ago

it is completely alright. I have an RTX 2070, and a single episode takes approximately 3 minutes and 52 seconds. the GPU does not matter as there are only two networks, each being a simple 2-layer MLP. we would expect a performance increase by a more powerful GPU if we had much more sophisticated networks. also, the environment operates on the CPU, so GPU doesn't matter.

regarding the # of episodes, I tried to stick to the original paper as much as I could, since it is written that they trained the agent for 5000 episodes of 20000 time steps. but I don't think there is a need for such a high number of episodes since the agent can get a reward of 15.5-16 in its first episode. 5-10 episodes would be good enough, in my opinion.

baturaysaglam commented 1 year ago

I've just updated the repo. previously, I followed the original paper but there were significant mistakes in terms of the theory and implementation. please clone and try the latest version. also, the environment is highly stochastic, you may not get the results that I got, it's just reinforcement learning.

baturaysaglam commented 1 year ago

I again updated the repository. there is a significant mistake in the paper. I refer to this in the updated README.md. I solved such an issue and the computational complexity is now substantially reduced. thus, you can obtain a well-trained agent for only 10,000 time steps now. I reproduced the result without changing any default parameter. I got the following (with a sliding window of 100), the agent is trained only for a single episode.

myplot

baturaysaglam commented 1 year ago

the absolute result differs since the environment stochasticity and stochasticity in the learning, such as network initialization, and channel matrices initialization. to get the exact result that I got, you need to run the code on the precise hardware and software settings, which is impossible. so, I see no issues anymore. I'm closing this issue

129kk commented 10 months ago

请问 为什么Learning Curves文件夹中我只能生成一个npy文件,而无法生成多个有规律的,例如0.001.npy,0.01npy,0.0001.npy?