BY571 / IQN-and-Extensions

PyTorch Implementation of Implicit Quantile Networks (IQN) for Distributional Reinforcement Learning with additional extensions like PER, Noisy layer, N-step bootstrapping, Dueling architecture and parallel env support.
MIT License
81 stars 16 forks source link

BUG #3

Closed THSWind closed 4 years ago

THSWind commented 4 years ago

Hello, Dittert I'm here again,after several test, I found your code has such a memory leak that it will be killed by the system after running for only a few hours. I test is by use 'BreakoutDeterministic-v4', 'SpaceInvadersDeterministic-v4', 'BreakoutNoFrameskip-v4'. All killed by the system. In just two hours, it takes up 62g of virtual memory. I didn't make any changes to your code.

I checked it use memory profiler and found the 'writer' didn't close. So I add writer.close() after it used. Things seem to be getting better, but still not. I didn't see the problem. It seems to be action= agent.act (state,eps) and agent.step(state, action, reward, next_state, done, writer, step, G_x) in run.py

BY571 commented 4 years ago

hey,

you mean the memory usage is steadily increasing? how big is your replay buffer? remember that each tuple in the buffer consists of two states so two (4, 84, 84).

haven't really tested the algorithm for several hours yet, but I can try later.

THSWind commented 4 years ago

@BY571 yes, it increasing until killed by system. the size of replay buffer is 1,000,000.

I tested it of course, several games. They were all killed in several hours(about 3 ~ 5 hours), no exception.

BY571 commented 4 years ago

@THSWind How much memory do you have on your GPU or in general? I know from my PC that it cant handle 1mio samples in the replay buffer. however you could try to save the sates as dtype=np.uint8 this should reduce memory

THSWind commented 4 years ago

@BY571 This machine's memory is 64G. (2080Ti GPU). Have you really tested the code implementation? Run 20000000 steps or see if it can complete the training task properly? I will try it as you say to save the sates as dtype=np.uint8 , but I don't think it's the root of the problem. buddy, I'm following up on your work. It is important for me~

BY571 commented 4 years ago

I trained on the Pong environment several times and had no problems even though with a 10.000 replay buffer since my system does not have that much memory.

BY571 commented 4 years ago

@THSWind if you want to discuss the problem you can join this discord channel: https://discord.gg/kHrBHC