Continue Training from checkpoint and GPU training (ResourceExhaustedError)

movie3105 commented 5 years ago

i already do some training and then i cancel it but i already make the checkpoint. so how can i continue my training with this checkpoint ?

morpheusthewhite commented 5 years ago

https://www.tensorflow.org/guide/saved_model#restore_variables

movie3105 commented 5 years ago

im sorry @morpheusthewhite but can you tell me what to do to train from checkpoint for this code cause i didnt really know what to do after read the documentation

morpheusthewhite commented 5 years ago

I think it should be enough to replace lines 82-133 in train.py with the code necessary to load the graph

movie3105 commented 5 years ago

I'm sorry sir but i dont really know what to change. But i will try to do some stuff and hope it's work. i will really appreciate if you or anyone can tell me the detail how to do that

movie3105 commented 5 years ago

i just make a guess here but when i do this and start the train it start from itteration 200 so i guess it works but the strange thing is when i make snapshot itteration 39990 it still work, so i didnt really know is this a right way to train from checkpoin ?

morpheusthewhite commented 5 years ago

No, it doesn't do what you want, this simply starts the count of the iterations from 200 (in this case). Also the self.snapshot() function STORES instead of loading the model.

morpheusthewhite commented 5 years ago

@movie3105 try to replace lines 117-134 with the following ones (properly indented)

  self.saver = tf.train.Saver(max_to_keep=100000)

  last_snapshot_iter = __PUT_HERE_ITERATION_NUMBER__
  filename = 'vgg16_faster_rcnn_iter_{:d}'.format(last_snapshot_iter) + '.ckpt'
  filename = os.path.join(self.output_dir, filename)

  saver.restore(sess, filename)
  print("Model restored.")

movie3105 commented 5 years ago

so i try to replace it like this : but it become error like this :

morpheusthewhite commented 5 years ago

My fault, it is self.saver.restore instead of saver.restore.

Also you need to deindent that block of code (the lines you added)

movie3105 commented 5 years ago

so it's like this right ? but it's error like this and why it's say "I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2" is that mean i'm training using CPU ?

movie3105 commented 5 years ago

should i erase the voc_2007_trainval_gt_roidb.pkl first ?

morpheusthewhite commented 5 years ago

and why it's say "I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2" is that mean i'm training using CPU ?

Probably yes. Have you installed tensorflow-gpu?

I can't find the line that causes the error (line 85 in train.py is different). Try to reinstall the project to make sure the code is correct and then modify it as said before.

morpheusthewhite commented 5 years ago

should i erase the voc_2007_trainval_gt_roidb.pkl first ?

Nope

movie3105 commented 5 years ago

im using this command when i install tensorflow pip install pip install tensorflow==2.0.0-beta1 but now ill install tensorflow for GPU this one right ? pip install tensorflow-gpu==2.0.0-beta1 i already install tensorflow gpu but when i try to train again this error occured

morpheusthewhite commented 5 years ago

You are given the answer: you need to install CUDA 10

movie3105 commented 5 years ago

You are given the answer: you need to install CUDA 10

i try to do so but why it like this i just realize its cuda 9 ill try to install cuda 10

morpheusthewhite commented 5 years ago

I cannot help, I've never installed Cuda on Windows. Look on the net if anybody had your problem

movie3105 commented 5 years ago

ill try to figure out this first

movie3105 commented 5 years ago

somehow i find away to install Cuda dan CUDnn but this error occured. how can i solve this ? i get this error when i try to train data

movie3105 commented 5 years ago

somehow i solve that problem but this happen is that mean i still train with my cpu ?

i try to check my gpu with : import tensorflow as tf tf.test.gpu_device_name()

this is the result

morpheusthewhite commented 5 years ago

Add print(tf.test.is_gpu_available()) to the file

movie3105 commented 5 years ago

Add print(tf.test.is_gpu_available()) to the file

which one ? if you mean like this: import tensorflow as tf tf.test.gpu_device_name() print(tf.test.is_gpu_available()) this is the result :

morpheusthewhite commented 5 years ago

You got the answer: no gpu is available. look for help on the nvidia forum

movie3105 commented 5 years ago

so cuda already detect the gpu but this problem appear again have a solution sir ?

movie3105 commented 5 years ago

so i already solve everything like gpu etc and i start the training but this error occured:

E:\Skripsi\Object Detection\Dataset\Faster-RCNN-TensorFlow-Python3-master>python train.py
WARNING: Logging before flag parsing goes to stderr.
W0806 19:10:42.565020  9512 __init__.py:687]

  TensorFlow's `tf-nightly` package will soon be updated to TensorFlow 2.0.

  Please upgrade your code to TensorFlow 2.0:
    * https://www.tensorflow.org/beta/guide/migration_guide

  Or install the latest stable TensorFlow 1.X release:
    * `pip install -U "tensorflow==1.*"`

  Otherwise your code may be broken by the change.

W0806 19:11:30.233638  9512 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Loaded dataset `voc_2007_trainval` for training
Set proposal method: gt
Appending horizontally-flipped training examples...
wrote gt roidb to E:\Skripsi\Object Detection\Dataset\Faster-RCNN-TensorFlow-Python3-master\data\cache\voc_2007_trainval_gt_roidb.pkl
done
Preparing training data...
done
W0806 19:12:26.867298  9512 module_wrapper.py:137] From train.py:78: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2019-08-06 19:12:27.277162: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-08-06 19:12:27.509794: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-08-06 19:12:29.418648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 0.8605
pciBusID: 0000:01:00.0
2019-08-06 19:12:29.419075: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-08-06 19:12:29.444429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-08-06 19:19:58.408426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-06 19:19:58.409696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-08-06 19:19:58.410024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-08-06 19:19:58.727391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1380 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
W0806 19:20:00.657849  9512 deprecation.py:323] From C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\contrib\layers\python\layers\layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W0806 19:20:01.848965  9512 deprecation.py:323] From E:\Skripsi\Object Detection\Dataset\Faster-RCNN-TensorFlow-Python3-master\lib\nets\network.py:185: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0806 19:20:01.879657  9512 deprecation.py:323] From E:\Skripsi\Object Detection\Dataset\Faster-RCNN-TensorFlow-Python3-master\lib\nets\network.py:190: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.

W0806 19:20:02.446123  9512 deprecation.py:323] From E:\Skripsi\Object Detection\Dataset\Faster-RCNN-TensorFlow-Python3-master\lib\nets\network.py:120: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0806 19:20:02.479708  9512 deprecation.py:506] From E:\Skripsi\Object Detection\Dataset\Faster-RCNN-TensorFlow-Python3-master\lib\nets\network.py:129: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
W0806 19:20:02.568953  9512 deprecation.py:323] From C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\contrib\layers\python\layers\layers.py:1634: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
W0806 19:20:02.912324  9512 deprecation.py:323] From E:\Skripsi\Object Detection\Dataset\Faster-RCNN-TensorFlow-Python3-master\lib\nets\network.py:218: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0806 19:20:03.188959  9512 deprecation.py:506] From C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\util\dispatch.py:180: calling expand_dims (from tensorflow.python.ops.array_ops) with dim is deprecated and will be removed in a future version.
Instructions for updating:
Use the `axis` argument instead
W0806 19:20:03.197199  9512 module_wrapper.py:137] From E:\Skripsi\Object Detection\Dataset\Faster-RCNN-TensorFlow-Python3-master\lib\nets\network.py:60: The name tf.summary.image is deprecated. Please use tf.compat.v1.summary.image instead.

W0806 19:20:03.439898  9512 module_wrapper.py:137] From train.py:89: The name tf.train.MomentumOptimizer is deprecated. Please use tf.compat.v1.train.MomentumOptimizer instead.

Loading initial model weights from ./data/imagenet_weights/vgg16.ckpt
2019-08-06 19:20:09.908191: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 411041792 exceeds 10% of system memory.
2019-08-06 19:20:10.579952: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 411041792 exceeds 10% of system memory.
2019-08-06 19:20:11.099810: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 411041792 exceeds 10% of system memory.
2019-08-06 19:20:11.722286: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 411041792 exceeds 10% of system memory.
2019-08-06 19:20:12.321312: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 411041792 exceeds 10% of system memory.
2019-08-06 19:20:28.996242: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 392.00MiB (rounded to 411041792).  Current allocation summary follows.
...
2019-08-06 19:20:29.057149: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (268435456):     Total Chunks: 2, Chunks in use: 1. 768.24MiB allocated for chunks. 512.00MiB in use in bin. 392.00MiB client-requested in use in bin.
2019-08-06 19:20:29.143134: I tensorflow/core/common_runtime/bfc_allocator.cc:885] Bin for 392.00MiB was 256.00MiB, Chunk State:
2019-08-06 19:20:29.144418: I tensorflow/core/common_runtime/bfc_allocator.cc:891]   Size: 256.24MiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev:   Size: 256B | Requested Size: 28B | in_use: 1 | bin_num: -1
2019-08-06 19:20:29.146761: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 1048576
2019-08-06 19:20:29.149603: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000501700000 next 1 of size 256
...
2019-08-06 19:20:29.290740: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 000000050170ED00 next 52 of size 73728
...
2019-08-06 19:20:29.440749: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 00000005317C3200 next 109 of size 256
2019-08-06 19:20:29.443177: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free  at 00000005317C3300 next 18446744073709551615 of size 268684544
2019-08-06 19:20:29.445583: I tensorflow/core/common_runtime/bfc_allocator.cc:914]      Summary of in-use Chunks by size:
2019-08-06 19:20:29.447917: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 44 
...
2019-08-06 19:20:29.498278: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 536870912 totalling 512.00MiB
2019-08-06 19:20:29.501057: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 768.76MiB
2019-08-06 19:20:29.503524: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 1074790400 memory_limit_: 1447614873 available bytes: 372824473 curr_region_allocation_bytes_: 1073741824
2019-08-06 19:20:29.506013: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats:
Limit:                  1447614873
InUse:                   806105856
MaxInUse:                806105856
NumAllocs:                     109
MaxAllocSize:            536870912

2019-08-06 19:20:29.508611: W tensorflow/core/common_runtime/bfc_allocator.cc:424] ***************************************xxxxxxxxxxx**************************________________________
2019-08-06 19:20:29.532457: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at random_op.cc:76 : Resource exhausted: OOM when allocating tensor with shape[25088,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2019-08-06 19:20:39.539724: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 392.00MiB (rounded to 411041792).  Current allocation summary follows.
...
2019-08-06 19:20:39.989111: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 789.66MiB
2019-08-06 19:20:39.991480: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 1074790400 memory_limit_: 1447614873 available bytes: 372824473 curr_region_allocation_bytes_: 1073741824
2019-08-06 19:20:39.994008: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats:
Limit:                  1447614873
InUse:                   828013568
MaxInUse:                828013568
NumAllocs:                     122
MaxAllocSize:            536870912

2019-08-06 19:20:39.996634: W tensorflow/core/common_runtime/bfc_allocator.cc:424] ***************************************xxxxxxxxxxx****************************______________________
2019-08-06 19:20:39.998679: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at assign_op.h:117 : Resource exhausted: OOM when allocating tensor with shape[25088,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\client\session.py", line 1360, in _do_call
    return fn(*args)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\client\session.py", line 1345, in _run_fn
    target_list, run_metadata)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\client\session.py", line 1438, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[25088,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node vgg_16/fc6/weights/Initializer/random_uniform/RandomUniform}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "train.py", line 216, in <module>
    train.train()
  File "train.py", line 120, in train
    sess.run(tf.variables_initializer(variables, name='init'))
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\client\session.py", line 951, in run
    run_metadata_ptr)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\client\session.py", line 1175, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\client\session.py", line 1354, in _do_run
    run_metadata)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\client\session.py", line 1379, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[25088,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[node vgg_16/fc6/weights/Initializer/random_uniform/RandomUniform (defined at C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\ops.py:1695) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Original stack trace for 'vgg_16/fc6/weights/Initializer/random_uniform/RandomUniform':
  File "train.py", line 216, in <module>
    train.train()
  File "train.py", line 85, in train
    layers = self.net.create_architecture(sess, "TRAIN", self.imdb.num_classes, tag='default')
  File "E:\Skripsi\Object Detection\Dataset\Faster-RCNN-TensorFlow-Python3-master\lib\nets\network.py", line 295, in create_architecture
    rois, cls_prob, bbox_pred = self.build_network(sess, training)
  File "E:\Skripsi\Object Detection\Dataset\Faster-RCNN-TensorFlow-Python3-master\lib\nets\vgg16.py", line 39, in build_network
    cls_score, cls_prob, bbox_pred = self.build_predictions(net, rois, is_training, initializer, initializer_bbox)
  File "E:\Skripsi\Object Detection\Dataset\Faster-RCNN-TensorFlow-Python3-master\lib\nets\vgg16.py", line 166, in build_predictions
    fc6 = slim.fully_connected(pool5_flat, 4096, scope='fc6')
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\contrib\framework\python\ops\arg_scope.py", line 182, in func_with_args
    return func(*args, **current_args)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\contrib\layers\python\layers\layers.py", line 1866, in fully_connected
    outputs = layer.apply(inputs)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 1650, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\layers\base.py", line 547, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 774, in __call__
    self._maybe_build(inputs)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 2094, in _maybe_build
    self.build(input_shapes)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\layers\core.py", line 1021, in build
    trainable=True)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\layers\base.py", line 460, in add_weight
    **kwargs)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 479, in add_weight
    aggregation=aggregation)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\training\tracking\base.py", line 712, in _add_variable_with_custom_getter
    **kwargs_for_getter)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\variable_scope.py", line 1500, in get_variable
    aggregation=aggregation)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\variable_scope.py", line 1243, in get_variable
    aggregation=aggregation)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\variable_scope.py", line 550, in get_variable
    return custom_getter(**custom_getter_kwargs)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\contrib\layers\python\layers\layers.py", line 1761, in layer_variable_getter
    return _model_variable_getter(getter, *args, **kwargs)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\contrib\layers\python\layers\layers.py", line 1752, in _model_variable_getter
    aggregation=aggregation)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\contrib\framework\python\ops\arg_scope.py", line 182, in func_with_args
    return func(*args, **current_args)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\contrib\framework\python\ops\variables.py", line 351, in model_variable
    aggregation=aggregation)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\contrib\framework\python\ops\arg_scope.py", line 182, in func_with_args
    return func(*args, **current_args)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\contrib\framework\python\ops\variables.py", line 281, in variable
    aggregation=aggregation)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\variable_scope.py", line 519, in _true_getter
    aggregation=aggregation)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\variable_scope.py", line 933, in _get_single_variable
    aggregation=aggregation)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\variables.py", line 256, in __call__
    return cls._variable_v1_call(*args, **kwargs)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\variables.py", line 217, in _variable_v1_call
    shape=shape)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\variables.py", line 195, in <lambda>
    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\variable_scope.py", line 2519, in default_variable_creator
    shape=shape)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\variables.py", line 260, in __call__
    return super(VariableMetaclass, cls).__call__(*args, **kwargs)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\variables.py", line 1625, in __init__
    shape=shape)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\variables.py", line 1755, in _init_from_args
    initial_value(), name="initial_value", dtype=dtype)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\variable_scope.py", line 905, in <lambda>
    partition_info=partition_info)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\contrib\layers\python\layers\initializers.py", line 145, in _initializer
    dtype, seed=seed)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\random_ops.py", line 241, in random_uniform
    rnd = gen_random_ops.random_uniform(shape, dtype, seed=seed1, seed2=seed2)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\ops\gen_random_ops.py", line 823, in random_uniform
    name=name)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 793, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3304, in create_op
    attrs, op_def, compute_device)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3373, in _create_op_internal
    op_def=op_def)
  File "C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1695, in __init__
    self._traceback = tf_stack.extract_stack()

movie3105 commented 5 years ago

which tensorflow version this code use ?

morpheusthewhite commented 5 years ago

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[25088,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node vgg_16/fc6/weights/Initializer/random_uniform/RandomUniform (defined at C:\Users\Ferdinan\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\framework\ops.py:1695) ]]

Your GPU memory is not sufficient to run this project

movie3105 commented 5 years ago

So 2gb Vram cant load this program ? Well i guess i really lack of equipment

movie3105 commented 5 years ago

can you hellp me with this error @morpheusthewhite ? AttributeError: module 'PIL' has no attribute 'Image'

dBeker / Faster-RCNN-TensorFlow-Python3

Continue Training from checkpoint and GPU training (ResourceExhaustedError) #87