MarvinTeichmann / KittiSeg

A Kitti Road Segmentation model implemented in tensorflow.
MIT License
910 stars 403 forks source link

memory issue, train.py #118

Open qoo opened 7 years ago

qoo commented 7 years ago

Hi, I use the train.py to train our data. However, it shows no memory. Which parameter should I change? Our data size is 6000*4000.

2017-08-28 21:55:18.073583: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 10.55GiB 2017-08-28 21:55:18.073595: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: Limit: 11332606362 InUse: 11332592128 MaxInUse: 11332592128 NumAllocs: 275 MaxAllocSize: 8108309248

2017-08-28 21:55:18.073630: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ***xxxxxxxxxxxxxxxxx 2017-08-28 21:55:18.073655: W tensorflow/core/framework/op_kernel.cc:1152] Resource exhausted: OOM when allocating tensor with shape[24000000,2] Traceback (most recent call last): File "train.py", line 131, in tf.app.run() File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "train.py", line 127, in main train.do_training(hypes) File "incl/tensorvision/train.py", line 396, in do_training run_training(hypes, modules, tv_graph, tv_sess) File "incl/tensorvision/train.py", line 245, in run_training feed_dict=feed_dict) File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run run_metadata_ptr) File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run feed_dict_string, options, run_metadata) File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run target_list, options, run_metadata) File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,3,4000,6000] [[Node: conv1_1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Processing/concat, conv1_1/filter/read)]] [[Node: Loss/loss/add_1/_27 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_1956_Loss/loss/add_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op u'conv1_1/Conv2D', defined at: File "train.py", line 131, in tf.app.run() File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "train.py", line 127, in main train.do_training(hypes) File "incl/tensorvision/train.py", line 377, in do_training

saqib1707 commented 6 years ago

Hi @qoo . Can you please explain how did you train the KittiSeg Model on your own data. As in where the training and validation data should be kept ?