Closed agentmorris closed 1 year ago
Running on master at 2f4f5a42807a71abecafa08bf8b49b052efd0e16
(Comment originally posted by bencevans)
Yes, I've faced this before due to a corrupt file. I usually try/catch around the PIL.Image.open line, and resume running the detector from a saved checkpoint to pick up where it left off.
On Mon, Oct 14, 2019 at 10:29 AM Ben Evans notifications@github.com wrote:
Before I start digging into it further, has anyone come across the following problem? I've run the detector twice and got the same result... thinking along the lines of corrupt file or faulty disk?
Potentially a duplicate of #94 https://github.com/microsoft/CameraTraps/issues/94 but doesn't contain any logs so unsure.
$ PYTHONPATH=$PYTHONPATH:$(pwd) python3 detection/run_tf_detector_batch.py --recursive --forceCpu --checkpointFrequency 1000 --outputRelativeFilenames ./detection/megadetector_v3.pb ../borneo-dataset/release/0.5/SAFE/SAFE_2/ ../Borneo-0.5-SAFE2.txt
tensorflow tf version: 1.14.0 2019-10-03 12:28:29.084524: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: FMA 2019-10-03 12:28:29.381379: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400065000 Hz 2019-10-03 12:28:29.386639: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3d43f60 executing computations on platform Host. Devices: 2019-10-03 12:28:29.386727: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0):
, tf_detector.py, tf.test.is_gpu_available: False WARNING: Logging before flag parsing goes to stderr. W1003 12:28:29.404660 140666144171840 deprecation_wrapper.py:119] From detection/run_tf_detector_batch.py:51: The name tf.logging.set_verbosity is deprecated. Please us e tf.compat.v1.logging.set_verbosity instead. W1003 12:28:29.405052 140666144171840 deprecation_wrapper.py:119] From detection/run_tf_detector_batch.py:51: The name tf.logging.ERROR is deprecated. Please use tf.com pat.v1.logging.ERROR instead.
Running detector on 57170 images Loading model... tf_detector.py: Loading graph... tf_detector.py: Detection graph loaded. Loaded model in 15.1 seconds Running detector... 0it [00:00, ?it/s]2019-10-03 12:28:55.971887: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because env var TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XL A is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile. var TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm [0/498] A is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile. 2019-10-03 12:29:01.395446: W tensorflow/core/framework/allocator.cc:107] Allocation of 377318400 exceeds 10% of system memory. 2019-10-03 12:29:01.487368: W tensorflow/core/framework/allocator.cc:107] Allocation of 377318400 exceeds 10% of system memory. 2019-10-03 12:29:01.982219: W tensorflow/core/framework/allocator.cc:107] Allocation of 99878400 exceeds 10% of system memory. 2019-10-03 12:29:02.555869: W tensorflow/core/framework/allocator.cc:107] Allocation of 159744000 exceeds 10% of system memory. 2019-10-03 12:29:02.715443: W tensorflow/core/framework/allocator.cc:107] Allocation of 159744000 exceeds 10% of system memory. Checkpointing 1 images to /tmp/detector_batch/tmpud3njeui......done 1000it [2:11:24, 7.87s/it]Checkpointing 1001 images to /tmp/detector_batch/tmp6wgeh416......done 2000it [4:23:28, 7.90s/it]Checkpointing 2001 images to /tmp/detector_batch/tmp_6tnl91e......done 3000it [6:36:44, 8.07s/it]Checkpointing 3001 images to /tmp/detector_batch/tmpnwm51sek......done 3344it [7:23:24, 8.08s/it]Traceback (most recent call last): File "detection/run_tf_detector_batch.py", line 559, in
main() File "detection/run_tf_detector_batch.py", line 554, in main load_and_run_detector(options) File "detection/run_tf_detector_batch.py", line 437, in load_and_run_detector boxes,scores,classes,imageFileNames = generate_detections(detector,imageFileNames,options) File "detection/run_tf_detector_batch.py", line 167, in generate_detections imageNP = PIL.Image.open(image).convert("RGB"); imageNP = np.array(imageNP) File "/home/bencevans/.local/lib/python3.6/site-packages/PIL/Image.py", line 912, in convert self.load() File "/home/bencevans/.local/lib/python3.6/site-packages/PIL/ImageFile.py", line 261, in load raise_ioerror(err_code) File "/home/bencevans/.local/lib/python3.6/site-packages/PIL/ImageFile.py", line 58, in raise_ioerror raise IOError(message + " when reading image file") OSError: broken data stream when reading image file Exception ignored in: <bound method tqdm.del of 3344it [7:23:25, 8.08s/it]> Traceback (most recent call last): File "/home/bencevans/.local/lib/python3.6/site-packages/tqdm/_tqdm.py", line 931, in del self.close() File "/home/bencevans/.local/lib/python3.6/site-packages/tqdm/_tqdm.py", line 1133, in close self._decr_instances(self) File "/home/bencevans/.local/lib/python3.6/site-packages/tqdm/_tqdm.py", line 496, in _decr_instances cls.monitor.exit() File "/home/bencevans/.local/lib/python3.6/site-packages/tqdm/_monitor.py", line 52, in exit self.join() File "/usr/lib/python3.6/threading.py", line 1053, in join raise RuntimeError("cannot join current thread") RuntimeError: cannot join current thread — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/microsoft/CameraTraps/issues/104?email_source=notifications&email_token=ABJVMSKQEJJFXXTO5GY6ROTQOSUALA5CNFSM4JAR7S22YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HRU4ZKQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJVMSJGIERGT3HCNGOPT7LQOSUALANCNFSM4JAR7S2Q .
-- Amrita Gupta PhD Student School of Computational Science & Engineering Georgia Institute of Technology
(Comment originally posted by amritagupta)
Can you get latest from master and try again? run_tf_detector_batch was updated last week to put both image loading and inference in a try/except, with reasonable behavior for failed images (warning printed, no output generated).
(Comment originally posted by agentmorris)
Thanks @amritagupta & @agentmorris. The try/except worked and now just checked the current master and works as expected bar a slight formatting which is addressed in #111.
(Comment originally posted by bencevans)
Before I start digging into it further, has anyone come across the following problem? I've run the detector twice and got the same result... thinking along the lines of corrupt file or faulty disk?
Potentially a duplicate of #94 but doesn't contain any logs so unsure.
Issue cloned from Microsoft/CameraTraps, original issue posted by bencevans on Oct 14, 2019.