Closed TiantianZhang closed 1 year ago
it is completely alright. I have an RTX 2070, and a single episode takes approximately 3 minutes and 52 seconds. the GPU does not matter as there are only two networks, each being a simple 2-layer MLP. we would expect a performance increase by a more powerful GPU if we had much more sophisticated networks. also, the environment operates on the CPU, so GPU doesn't matter.
regarding the # of episodes, I tried to stick to the original paper as much as I could, since it is written that they trained the agent for 5000 episodes of 20000 time steps. but I don't think there is a need for such a high number of episodes since the agent can get a reward of 15.5-16 in its first episode. 5-10 episodes would be good enough, in my opinion.
I've just updated the repo. previously, I followed the original paper but there were significant mistakes in terms of the theory and implementation. please clone and try the latest version. also, the environment is highly stochastic, you may not get the results that I got, it's just reinforcement learning.
I again updated the repository. there is a significant mistake in the paper. I refer to this in the updated README.md. I solved such an issue and the computational complexity is now substantially reduced. thus, you can obtain a well-trained agent for only 10,000 time steps now. I reproduced the result without changing any default parameter. I got the following (with a sliding window of 100), the agent is trained only for a single episode.
the absolute result differs since the environment stochasticity and stochasticity in the learning, such as network initialization, and channel matrices initialization. to get the exact result that I got, you need to run the code on the precise hardware and software settings, which is impossible. so, I see no issues anymore. I'm closing this issue
请问 为什么Learning Curves文件夹中我只能生成一个npy文件,而无法生成多个有规律的,例如0.001.npy,0.01npy,0.0001.npy?
How long does it take to run this program with the default parameters? --num_time_steps_per_eps = 20000 --num_eps default=5000
I run this main.py on a rtx3090 GPU, one eps needs 4min. Is this allright?