ResourcExhaustedError - Githubissues

justcho5 commented 5 years ago

I was wondering if this type of error was ever seen with using ACE:

365 Caused by op 'import/xception/block2_sepconv1/separable_conv2d', defined at: 366 File "./ACE/ace_run.py", line 127, in 367 main(parse_arguments(sys.argv[1:])) 368 File "./ACE/ace_run.py", line 40, in main 369 sess, str(args.model_to_run), args.model_path, args.labels_path) 370 File "/home/hjcho/projects/hnsc/histoXai/ACE/ace_helpers.py", line 45, in make_model 371 mymodel= model.XceptionHPVWrapper_public(sess, model_path, labels_path) 372 File "/home/hjcho/projects/hnsc/histoXai/tcav/tcav/model.py", line 477, in init 373 super(XceptionHPVWrapper_public, self).init(sess, model_path, labels_path, image_shape, endpoints_xc, 'import') 374 File "/home/hjcho/projects/hnsc/histoXai/tcav/tcav/model.py", line 251, in init 375 scope=scope) 376 File "/home/hjcho/projects/hnsc/histoXai/tcav/tcav/model.py", line 324, in import_graph 377 input_graph_def, graph_inputs, list(endpoints.values()), name=sc) 378 File "/home/hjcho/anaconda3/envs/env_tf113/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func 379 return func(*args, **kwargs) 380 File "/home/hjcho/anaconda3/envs/env_tf113/lib/python3.7/site-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def 381 _ProcessNewOps(graph) 382 File "/home/hjcho/anaconda3/envs/env_tf113/lib/python3.7/site-packages/tensorflow/python/framework/importer.py", line 235, in _ProcessNewOps 383 for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access 384 File "/home/hjcho/anaconda3/envs/env_tf113/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3433, in _add_new_tf_operations 385 for c_op in c_api_util.new_tf_operations(self) 386 File "/home/hjcho/anaconda3/envs/env_tf113/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3433, in 387 for c_op in c_api_util.new_tf_operations(self) 388 File "/home/hjcho/anaconda3/envs/env_tf113/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3325, in _create_op_from_tf_operation 389 ret = Operation(c_op, self) 390 File "/home/hjcho/anaconda3/envs/env_tf113/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in init 391 self._traceback = tf_stack.extract_stack() 392 393 ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[100,296,296,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocato 394 [[node import/xception/block2_sepconv1/separable_conv2d (defined at /home/hjcho/projects/hnsc/histoXai/tcav/tcav/model.py:324) ]] 395 Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. 396 397 [[node import/xception/add_9/add (defined at /home/hjcho/projects/hnsc/histoXai/tcav/tcav/model.py:324) ]] 398 Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. 399

If you have seen it, do you know how it can be resolved?

tabularML commented 5 years ago

Your batchsize is too big for the machine you are using (100 is usually big for any machine you would be using)

justcho5 commented 5 years ago

Thanks for the reply! Is the batch size specific for the model I am using or is it a parameter for the ACE method? I used a batch size of 16 for training the model...i'm not sure where the 100 comes from in that tensor--is 100 the default?

justcho5 commented 5 years ago

I found it! batch size is set to default at 100 for the _patch_activations function in ace.py. Thank you!

amiratag / ACE

ResourcExhaustedError #8