erikalu / class-agnostic-counting

63 stars 14 forks source link

GPU requirement for the model to train #4

Closed sri9s closed 4 years ago

sri9s commented 4 years ago

Hey, I use a RTX 2080ti. When I train/adapt the model on vgg cell data set it receive the following error:

2019-12-07 01:12:55.503392: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 1073741824 totalling 1.00GiB 2019-12-07 01:12:55.505425: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum Total of in-use chunks: 8.00GiB 2019-12-07 01:12:55.512976: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocatedbytes: 9078499328 memorylimit: 9078499573 available bytes: 245 curr_region_allocationbytes: 8589934592 2019-12-07 01:12:55.516720: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats: Limit: 9078499573 InUse: 8586890240 MaxInUse: 8586939392 NumAllocs: 4677 MaxAllocSize: 3066167296

2019-12-07 01:12:55.521044: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ****____ 2019-12-07 01:12:55.532038: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at transpose_op.cc:199 : Resource exhausted: OOM when allocating tensor with shape[24,512,100,100] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Traceback (most recent call last): File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call return fn(*args) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[24,256,200,200] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node resnet_base_1/bn2a_branch2c/FusedBatchNorm}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[loss_1/add_59/_7669]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[24,256,200,200] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node resnet_base_1/bn2a_branch2c/FusedBatchNorm}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations. 0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 172, in adapt_gmn() File "main.py", line 165, in adapt_gmn verbose=1) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper return func(*args, kwargs) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\engine\training.py", line 2224, in fit_generator class_weight=class_weight) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\engine\training.py", line 1883, in train_on_batch outputs = self.train_function(ins) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\backend\tensorflow_backend.py", line 2478, in call self.session_kwargs) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\tensorflow\python\client\session.py", line 950, in run run_metadata_ptr) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\tensorflow\python\client\session.py", line 1173, in _run feed_dict_tensor, options, run_metadata) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run run_metadata) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[24,256,200,200] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node resnet_base_1/bn2a_branch2c/FusedBatchNorm (defined at C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\backend\tensorflow_backend.py:1802) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[loss_1/add_59/_7669]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[24,256,200,200] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node resnet_base_1/bn2a_branch2c/FusedBatchNorm (defined at C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\backend\tensorflow_backend.py:1802) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations. 0 derived errors ignored.

Errors may have originated from an input operation. Input Source operations connected to node resnet_base_1/bn2a_branch2c/FusedBatchNorm: resnet_base_1/res2a_branch2c/BiasAdd (defined at C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\backend\tensorflow_backend.py:3769) bn2a_branch2c_1/gamma/read (defined at C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\backend\tensorflow_backend.py:395)

Input Source operations connected to node resnet_base_1/bn2a_branch2c/FusedBatchNorm: resnet_base_1/res2a_branch2c/BiasAdd (defined at C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\backend\tensorflow_backend.py:3769) bn2a_branch2c_1/gamma/read (defined at C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\backend\tensorflow_backend.py:395)

Original stack trace for 'resnet_base_1/bn2a_branch2c/FusedBatchNorm': File "main.py", line 172, in adapt_gmn() File "main.py", line 114, in adapt_gmn model = model_factory.two_stream_matching_networks(trn_config, sync=False, adapt=True) File "C:\Users\Sri\Documents\GitHub\class-agnostic-counting\src\model_factory.py", line 99, in two_stream_matching_networks image_f = basenet(inputs[1]) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\engine\topology.py", line 619, in call output = self.call(inputs, kwargs) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\engine\topology.py", line 2085, in call outputtensors, , _ = self.run_internal_graph(inputs, masks) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\engine\topology.py", line 2236, in run_internal_graph output_tensors = _to_list(layer.call(computed_tensor, kwargs)) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\layers\normalization.py", line 181, in call epsilon=self.epsilon) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\backend\tensorflow_backend.py", line 1827, in normalize_batch_in_training epsilon=epsilon) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\keras\backend\tensorflow_backend.py", line 1802, in _fused_normalize_batch_in_training data_format=tf_data_format) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\tensorflow\python\ops\nn_impl.py", line 1329, in fused_batch_norm name=name) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 4301, in _fused_batch_norm name=name) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func return func(*args, **kwargs) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\tensorflow\python\framework\ops.py", line 3616, in create_op op_def=op_def) File "C:\Users\Sri\Anaconda3\envs\counting\lib\site-packages\tensorflow\python\framework\ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

erikalu commented 4 years ago

Hi, you're getting an OOM (out of memory) error. Use a smaller batch size than the default one (which is 24) by passing in, e.g. --batch_size 8.

sri9s commented 4 years ago

Hi, you're getting an OOM (out of memory) error. Use a smaller batch size than the default one (which is 24) by passing in, e.g. --batch_size 8.

Thanks, could you possibly point me to a point annotation tool?

erikalu commented 4 years ago

Thanks, could you possibly point me to a point annotation tool?

Sorry, can you further clarify what you are trying to do? If you want to label some images, you can script an interface using Matplotlib.