Open hyejung-rachel opened 5 years ago
What's your system configuration?
Helllo, facing the same problem when training with the kangaroo data set . Reducing training bach from 16 to 8 and 4 has not changed anything CPU 8Gb + ( Intel(R) UHD Graphics 630 GPU Geforce GTX 1050 3GB 'extension up to 8GB), Windows10 + Anaconda
from import tensorflow as tf from tensorflow.python.client import device_lib print(device_lib.list_localdevices()) I have the response GPU:0 with 2131 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1) [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 18102239670215265869 , name: "/device:GPU:0" device_type: "GPU" memory_limit: 2235275673 locality { bus_id: 1 links { } } incarnation: 6041356209009565047 physical_device_desc: "device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1" ]
Thanks for your help and advise ....
I think your model is too heavy to being training on your GPU. Quick comparison for you, 1050 TI tested on batch_size:1
on 9.500 pics dataset. 2, 4 or 8 OK maybe.
`2019-03-12 15:23:38.002612: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Sum Total of in-use chunks: 14.42GiB 2019-03-12 15:23:38.002618: I tensorflow/core/common_runtime/bfc_allocator.cc:680] Stats: Limit: 15866508084 InUse: 15487862272 MaxInUse: 15501256704 NumAllocs: 2093 MaxAllocSize: 2034522624
2019-03-12 15:23:38.002688: W tensorflow/core/common_runtime/bfcallocator.cc:279] ******* 2019-03-12 15:23:38.002714: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at transpose_op.cc:199 : Resource exhausted: OOM when allocating tensor with shape[4,32,1952,1952] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Using TensorFlow backend. /anaconda/envs/py35/lib/python3.5/site-packages/keras/callbacks.py:999: UserWarning:
main(args)
File "train.py", line 257, in main
max_queue_size = 8
File "/anaconda/envs/py35/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, *kwargs)
File "/anaconda/envs/py35/lib/python3.5/site-packages/keras/engine/training.py", line 1415, in fit_generator
initial_epoch=initial_epoch)
File "/anaconda/envs/py35/lib/python3.5/site-packages/keras/engine/training_generator.py", line 213, in fit_generator
class_weight=class_weight)
File "/anaconda/envs/py35/lib/python3.5/site-packages/keras/engine/training.py", line 1215, in train_on_batch
outputs = self.train_function(ins)
File "/anaconda/envs/py35/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2666, in call
return self._call(inputs)
File "/anaconda/envs/py35/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2636, in _call
fetched = self._callable_fn(array_vals)
File "/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1454, in call
self._session._session, self._handle, args, status, None)
File "/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4,32,1952,1952] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: replica_0/model_1/leaky_0/LeakyRelu/mul = Mul[T=DT_FLOAT, _class=["loc:@train.../Reshape_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/leaky_80/LeakyRelu/alpha, replica_0/model_1/bnorm_0/cond/Merge)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
`
epsilon
argument is deprecated and will be removed, usemin_delta
instead. warnings.warn('epsilon
argument is deprecated and ' Traceback (most recent call last): File "train.py", line 280, inIt is not possible to reduce the resolution of the images I am training on. Can I use this model to train? Because trying to reduce the batch size doesn't work. Nor reducing the data. Thank you!