Closed cedwards77 closed 4 years ago
Thank you for raising this issue @cedwards77. It should now be working, please let me know if you encounter further errors. It was due to me refactoring chunks of the code and integrating multiple agents. I have now moved this code to my forked repository and will push once all testing, validating and training will be working.
Please note that you may receive a memory error if there is not enough RAM available, such as:
I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 1613758464 memory_limit_: 2126008811 available bytes: 512250347 curr_region_allocation_bytes_: 1073741824 2020-02-17 13:37:45.319171: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats: Limit: 2126008811 InUse: 923033856 MaxInUse: 923034624 NumAllocs: 307 MaxAllocSize: 559872000
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[48,32,45,45,45] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node conv0/Conv3D}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Thank you for your quick response @gml16 ! It seems to be working now.
The example code works great for 'eva'l and 'play' tasks, but when I tried running the training example, I'm getting errors such 'TypeError: step() missing 1 required positional argument: 'isOver'.
Here is the command that I used: python DQN.py --task train --algo DQN --gpu 0 --files './data/filenames/image_files.txt' './data/filenames/landmark_files.txt'
Any help you could provide is greatly appreciated! I'm really excited about your published results.