Open developeralgo8888 opened 7 years ago
That's an interesting observation. I've tested the code on a Maxwell TITAN X myself and didn't observe such behavior. Can you please share the version of your libraries (python, TensorFlow, cuda, ...) . My (blind) guess is that this is a problem with TensorFlow. It would be great if you share your Motherboard spec since PCI-E is the bottleneck here.
Two side notes:
nvidia-smi
command).It is also interesting understanding if the number of agents is increasing during training. That may explain the increase in CPU usage.
Sent from my iPhone Sory ForSpell Ing hErRRors :)
On Mar 19, 2017, at 11:36 AM, Mohammad Babaeizadeh notifications@github.com<mailto:notifications@github.com> wrote:
That's an interesting observation. I've tested the code on a Maxwell TITAN X myself and didn't observe such behavior. Can you please share the version of your libraries (python, TensorFlow, cuda, ...) . My (blind) guess is that this is a problem with TensorFlow. It would be great if you share your Motherboard spec since PCI-E is the bottleneck here.
Two side notes:
- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/NVlabs/GA3C/issues/20#issuecomment-287636992, or mute the threadhttps://github.com/notifications/unsubscribe-auth/APNcGlo3t01noF41xG_hzFurrFEpCLQtks5rnXWQgaJpZM4MhwML.
@ifrosio that's a very good point. @developeralgo8888 please try with DYNAMIC_SETTINGS=False
Please find attached. i restarted the run and it has started increasing as we go .
with DYNAMIC_SETTINGS=False ,
The CPU remains stable but you do have memory leak . The memory keeps increasing until the system freeze
i have attached the snapshots which are roughly 12 hours apart High_CPU_and_memory.txt
The code runs fine but leaks CPU and Memory and will crush your system . I am using Glances diagnostic or monitoring tool ( pip install glances ) . You will notice that if you leave your code running for a long time the CPU context switches increases substantially and the CPU & Memory keeps increasing until your code hangs or crushes . CPU usage increased from 6.7% to 64% and Memory from 10% to 79% at that point it caused the system freeze. When i look at the Nvidia TITAN X ( Maxwell --12 GB mem) usage it is only using about 300 MB out 12 GB. So it seems while most of the heavy lifting should be offloaded to the GPU in this case it does not seem to be the case. I have 8 x TITAN Maxwell GPUs with 2 x Intel Xeon 2660 v3 (2 CPU with total 40 CPU Cores ) with 128GB of DDR4 memory and i can use any of them . Still i get same results , the CPU will keep increasing
Any insights?
Other original A3C or various hybrid ( CPU & GPU ) versions seem to offload most of the heavy lifting to GPU and causes no system freezes but not with GA3C
Testing it on various amounts of data and games