Nkap23 / TensorFlow_with_Colab_tutorial

TensorFlow 2 Object Detection API with Google Colab!
MIT License
15 stars 23 forks source link

Step 11 does not work for me #1

Closed DtXFS closed 3 years ago

DtXFS commented 3 years ago

Hello, I am following your tutorial and so far everything worked out fine for me, because it is perfectly described. But with Step 11 I get a lot of traceback warnings and the two .record-files are not generated. I checked if all the paths are right, and if the generate_tfrecord.py and the label_map.pbtxt are in place. Can you somehow help me?

Here is a part of the feedback I get while running Step 11: /content/gdrive/My Drive/TensorFlow/scripts/preprocessing Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/object_detection-0.1-py3.6.egg/object_detection/utils/label_map_util.py", line 159, in load_labelmap text_format.Merge(label_map_string, label_map) File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 734, in Merge allow_unknown_field=allow_unknown_field) File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 802, in MergeLines return parser.MergeLines(lines, message) File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 827, in MergeLines self._ParseOrMerge(lines, message) File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 849, in _ParseOrMerge self._MergeField(tokenizer, message) File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 974, in _MergeField merger(tokenizer, message, field) File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 1048, in _MergeMessageField self._MergeField(tokenizer, sub_message) File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 974, in _MergeField merger(tokenizer, message, field)

Thanks in advance!

Nkap23 commented 3 years ago

Hi! Have you added the correct paths to the train/test folder and the annotations folder in step 11?

Also, make sure that the labels you have used to annotate your image and the 'name' you have used in label_map.pbtxt are exactly the same! (like if the label in the image is 'banana' and if you have used 'Banana' in label_map, the code will throw an error)

Also, is this the whole feedback you are getting after running? Generally, the last few lines should describe the error you are getting!

Also a NOTE: If you are working over multiple Colab sessions, you need to run steps 6 to steps 9 every new session!

DtXFS commented 3 years ago

Thanks for your quick reply! I checked everything again and I found the problem: I just copied the the input for the label_map-pbtxt from your website and the quote marks get confused. It was `´ instead of ''. So I just fixed this. But now the training ist not working. I am taking my own data set but set all the files just as you did. But it is not working properly. I would be very happy if you yould help again. In Step 14 the tensorboard only shows "No dashboards are active for the current data set." There is no change here. For Step 15 then you can find the last lines of the feedback here: 2020-09-28 15:18:57.625063: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 9 Chunks of size 147456 totalling 1.27MiB 2020-09-28 15:18:57.625075: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 158976 totalling 155.2KiB 2020-09-28 15:18:57.625086: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 160000 totalling 156.2KiB 2020-09-28 15:18:57.625181: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 3 Chunks of size 221184 totalling 648.0KiB 2020-09-28 15:18:57.625200: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 26 Chunks of size 262144 totalling 6.50MiB 2020-09-28 15:18:57.625213: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 5 Chunks of size 276480 totalling 1.32MiB 2020-09-28 15:18:57.625234: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 2 Chunks of size 327680 totalling 640.0KiB 2020-09-28 15:18:57.625246: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 393216 totalling 384.0KiB 2020-09-28 15:18:57.625262: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 6 Chunks of size 409600 totalling 2.34MiB 2020-09-28 15:18:57.625277: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 442368 totalling 432.0KiB 2020-09-28 15:18:57.625289: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 14 Chunks of size 524288 totalling 7.00MiB 2020-09-28 15:18:57.625304: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 16 Chunks of size 589824 totalling 9.00MiB 2020-09-28 15:18:57.625319: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 5 Chunks of size 638976 totalling 3.05MiB 2020-09-28 15:18:57.625331: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 673792 totalling 658.0KiB 2020-09-28 15:18:57.625363: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 720896 totalling 704.0KiB 2020-09-28 15:18:57.625592: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 818432 totalling 799.2KiB 2020-09-28 15:18:57.625618: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 34 Chunks of size 1048576 totalling 34.00MiB 2020-09-28 15:18:57.625632: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 2 Chunks of size 1310720 totalling 2.50MiB 2020-09-28 15:18:57.625654: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 1572864 totalling 1.50MiB 2020-09-28 15:18:57.625665: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 1638400 totalling 1.56MiB 2020-09-28 15:18:57.625675: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 1703936 totalling 1.62MiB 2020-09-28 15:18:57.625686: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 12 Chunks of size 2097152 totalling 24.00MiB 2020-09-28 15:18:57.625698: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 66 Chunks of size 2359296 totalling 148.50MiB 2020-09-28 15:18:57.625709: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 2697216 totalling 2.57MiB 2020-09-28 15:18:57.625721: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 2 Chunks of size 2883584 totalling 5.50MiB 2020-09-28 15:18:57.625732: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 2984960 totalling 2.85MiB 2020-09-28 15:18:57.625846: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 3033088 totalling 2.89MiB 2020-09-28 15:18:57.625866: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 3538944 totalling 3.38MiB 2020-09-28 15:18:57.625879: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 3 Chunks of size 3670016 totalling 10.50MiB 2020-09-28 15:18:57.625890: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 3932160 totalling 3.75MiB 2020-09-28 15:18:57.625902: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 3948544 totalling 3.77MiB 2020-09-28 15:18:57.625914: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 20 Chunks of size 4194304 totalling 80.00MiB 2020-09-28 15:18:57.625925: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 4444160 totalling 4.24MiB 2020-09-28 15:18:57.625937: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 13 Chunks of size 6553600 totalling 81.25MiB 2020-09-28 15:18:57.625954: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 4 Chunks of size 8388608 totalling 32.00MiB 2020-09-28 15:18:57.625970: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 14 Chunks of size 9437184 totalling 126.00MiB 2020-09-28 15:18:57.625986: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 13094400 totalling 12.49MiB 2020-09-28 15:18:57.625999: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 12 Chunks of size 13107200 totalling 150.00MiB 2020-09-28 15:18:57.626015: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 7 Chunks of size 16368128 totalling 109.27MiB 2020-09-28 15:18:57.626027: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 62 Chunks of size 20460032 totalling 1.18GiB 2020-09-28 15:18:57.626062: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 22118400 totalling 21.09MiB 2020-09-28 15:18:57.626080: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 24816384 totalling 23.67MiB 2020-09-28 15:18:57.626092: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 43 Chunks of size 26214400 totalling 1.05GiB 2020-09-28 15:18:57.626109: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 39742976 totalling 37.90MiB 2020-09-28 15:18:57.626121: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 22 Chunks of size 52428800 totalling 1.07GiB 2020-09-28 15:18:57.626137: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 78643200 totalling 75.00MiB 2020-09-28 15:18:57.626153: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 80124672 totalling 76.41MiB 2020-09-28 15:18:57.626166: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 93019136 totalling 88.71MiB 2020-09-28 15:18:57.626181: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 41 Chunks of size 104857600 totalling 4.00GiB 2020-09-28 15:18:57.626193: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 128570624 totalling 122.61MiB 2020-09-28 15:18:57.626209: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 129916928 totalling 123.90MiB 2020-09-28 15:18:57.626235: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 130590208 totalling 124.54MiB 2020-09-28 15:18:57.626252: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 181806848 totalling 173.38MiB 2020-09-28 15:18:57.626268: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 9 Chunks of size 209715200 totalling 1.76GiB 2020-09-28 15:18:57.626281: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 6 Chunks of size 419430400 totalling 2.34GiB 2020-09-28 15:18:57.626297: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 481582080 totalling 459.27MiB 2020-09-28 15:18:57.626312: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 524345344 totalling 500.05MiB 2020-09-28 15:18:57.626328: I tensorflow/core/common_runtime/bfc_allocator.cc:1034] 1 Chunks of size 549453824 totalling 524.00MiB 2020-09-28 15:18:57.626344: I tensorflow/core/common_runtime/bfc_allocator.cc:1038] Sum Total of in-use chunks: 14.57GiB 2020-09-28 15:18:57.626360: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] total_region_allocatedbytes: 15695549440 memorylimit: 15695549568 available bytes: 128 curr_region_allocationbytes: 17179869184 2020-09-28 15:18:57.626381: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Stats: Limit: 15695549568 InUse: 15642191104 MaxInUse: 15642299136 NumAllocs: 114174 MaxAllocSize: 2723676160 Reserved: 0 PeakReserved: 0 LargestFreeBlock: 0

2020-09-28 15:18:57.626524: W tensorflow/core/common_runtime/bfc_allocator.cc:439] **** 2020-09-28 15:18:57.626571: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at cwise_ops_common.cc:82 : Resource exhausted: OOM when allocating tensor with shape[100,51150] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Traceback (most recent call last): File "model_main_tf2.py", line 113, in tf.compat.v1.app.run() File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run _run_main(main, args) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "model_main_tf2.py", line 110, in main record_summaries=FLAGS.record_summaries) File "/usr/local/lib/python3.6/dist-packages/object_detection-0.1-py3.6.egg/object_detection/model_lib_v2.py", line 639, in train_loop loss = _dist_train_step(train_input_iter) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 780, in call result = self._call(*args, *kwds) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 807, in _call return self._stateless_fn(args, **kwds) # pylint: disable=not-callable File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2829, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call cancellation_manager=cancellation_manager) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 550, in call ctx=ctx) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[100,51150] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node Loss/Compare_10/IOU/Intersection/Minimum_1 (defined at /local/lib/python3.6/dist-packages/object_detection-0.1-py3.6.egg/object_detection/core/box_list_ops.py:257) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Func/learning_rate_1/write_summary/summary_cond/then/_50/input/_131/_400]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[100,51150] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node Loss/Compare_10/IOU/Intersection/Minimum_1 (defined at /local/lib/python3.6/dist-packages/object_detection-0.1-py3.6.egg/object_detection/core/box_list_ops.py:257) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations. 0 derived errors ignored. [Op:inferencedist_train_step_52374]

Errors may have originated from an input operation. Input Source operations connected to node Loss/Compare_10/IOU/Intersection/Minimum_1: Loss/Compare_10/IOU/Intersection/split (defined at /local/lib/python3.6/dist-packages/object_detection-0.1-py3.6.egg/object_detection/core/box_list_ops.py:250)

Input Source operations connected to node Loss/Compare_10/IOU/Intersection/Minimum_1: Loss/Compare_10/IOU/Intersection/split (defined at /local/lib/python3.6/dist-packages/object_detection-0.1-py3.6.egg/object_detection/core/box_list_ops.py:250)

Function call stack: _dist_train_step -> _dist_train_step

DtXFS commented 3 years ago

In Step 15 do I have to put in the Path of the pre-trained-model or the empty model folder? Because just the name of the downloaded model is not identical it is "my_ssd_resnet50_v1_fpn" in the new folder and "ssd_resnet50_v1_fpn_640x640_coco17_tpu-8" in the downloaded folder... !python model_main_tf2.py --model_dir=models/my_ssd_resnet50_v1_fpn --pipeline_config_path=models/my_ssd_resnet50_v1_fpn/pipeline.config# !python model_main_tf2.py --model_dir=models/[name_of_pre-trained-model_you_downloaded] --pipeline_config_path=models/[name_of_pre-trained-model_you_downloaded]/pipeline.config

Nkap23 commented 3 years ago

For Tensorboard, you should have a setting option on the top-right corner of the Tensorboard display. Go to that settings and check the 'reload data' box. This will reload the TensorFlow automatically. You can also set the reload period (if the reload period is 30, the Tensorboard will reload every 30 secs automatically)

For step 15 error, you are running out of memory, try reducing the batch size in pipeline.config and restarting the session! You have to put the path of the new empty model folder which contains the modified pipeline.config file (models/my_ssd_resnet50_v1_fpn)

Nkap23 commented 3 years ago

Closing this due to lack of activity!