experiencor / keras-yolo3

Training and Detecting Objects with YOLO3
MIT License
1.6k stars 861 forks source link

Training with gpu #151

Open ssetty opened 5 years ago

ssetty commented 5 years ago

Hello.

When I train with GPU get below exception,

resizing: 448 448 resizing: 448 448 Traceback (most recent call last): File "/home/aic_subscription/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call return fn(*args) File "/home/aic_subscription/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/aic_subscription/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized. [[Node: replica_1/model_1/yolo_layer_3/cond/strided_slice/Switch/_4523 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", se nd_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_13819_replica_1/model_1/yolo_layer_3/cond/stridedslice/Switch", tensor type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "train.py", line 280, in main(args) File "train.py", line 257, in main max_queue_size = 8

File "/home/aic_subscription/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run run_metadata) File "/home/aic_subscription/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized. [[Node: replica_1/model_1/yolo_layer_3/cond/strided_slice/Switch/_4523 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", se nd_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_13819_replica_1/model_1/yolo_layer_3/cond/stridedslice/Switch", tensor type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

abramjos commented 5 years ago

Try reducing the batch size or image size. The model can't be trained as it cannot allocate enough data.