Program seems to hit a dead-lock when using multiple cores - Githubissues

agentmorris / MegaDetector

MegaDetector is an AI model that helps conservation folks spend less time doing boring things with camera trap images.

MIT License

117 stars 26 forks source link

Program seems to hit a dead-lock when using multiple cores #35

Closed agentmorris closed 1 year ago

agentmorris commented 1 year ago

Hello, As I wanted to increase the performance of the program run_tf_detector_batch.py, I used the option --ncores you provided. I tried it with 2 or 4 cores but each time, the program is stuck. Here are the last shell outputs I get:

Creating pool with 2 cores
TFDetector: Loading graph...
TFDetector: Loading graph...
TFDetector: Detection graph loaded.
TFDetector: Detection graph loaded.
Loaded model (batch level) in 6.78 seconds
Loaded model (batch level) in 6.78 seconds
Processing image /home/...
Processing image /home/...
2020-08-17 09:43:40.366492: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 20054016 exceeds 10% of system memory.
2020-08-17 09:43:40.366493: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 20054016 exceeds 10% of system memory.

When I don't use the ncores option, I process 30 images in ~5 minutes. I've waited 10 minutes with the option, but it didn't process even 1 image.

Do you know what could cause this behaviour ?

Issue cloned from Microsoft/CameraTraps, original issue posted by plrevolve on Aug 17, 2020.

agentmorris commented 1 year ago

I haven't seen that before... how much RAM do you have on your PC? And while it's running, can you check on how much RAM this process is consuming (in task manager on Windows, or top on Linux) and how much RAM is allocated overall?

The allocation warning combined with the behavior you describe suggests that maybe you're hitting RAM limits (though I'm just guessing).

(Comment originally posted by agentmorris)

agentmorris commented 1 year ago

I have 8 Go Ram. When I run the program without the ncores option, It consumes 1 or 2 GB on top of the 4 GB already allocated. There is a strange behaviour when I run with ncores option, it does'nt allocate more memory (and my fans don't start to scream as usual) like if the program froze up just after loading the model.

(Comment originally posted by plrevolve)

agentmorris commented 1 year ago

8GB of RAM should be plenty here... I'm sorry, I'm at a loss here, and I can't reproduce this issue.

Are you on Windows or on Linux? And there's nothing else unusual about your system that we're not seeing here, e.g. this isn't Windows running virtually on Linux or vice-versa? You haven't modified the script or other components of the code base? How many images are you running this on? If you're running this on a large number of images, have you tested it on just a handful for debugging?

Ah, non-reproducible issues. If developers wrote action movies, what all the evil villains would do is just find hard-to-reproduce bugs and send them out to all of their enemies so as to deny them sleep and/or joy.

(Comment originally posted by agentmorris)

agentmorris commented 1 year ago

I can confirm using --ncores cause run_tf_detector_batch.py to hang on Ubuntu 18.04 / Python 3.6.9 / TensorFlow 1.15. The problem is that self.tf_sesison.run() in run_tf_detector.py never returns.

I think the issue has to do with the multiprocessing pool. Default behavior on Linux is to fork not spawn a child like on Windows. I don't believe that TF is fork friendly. To make the batch process fork friendly my understanding is that you would have to do all of the initialization of TF and the session in the child process.

(Comment originally posted by persts)

agentmorris commented 1 year ago

Here is one way to get it to work on Ubuntu:

index 692b56e..0f9f70a 100644
--- a/detection/run_tf_detector_batch.py
+++ b/detection/run_tf_detector_batch.py
@@ -44,7 +44,7 @@ from functools import partial
 import humanfriendly
 from tqdm import tqdm
 # from multiprocessing.pool import ThreadPool as workerpool
-from multiprocessing.pool import Pool as workerpool
+import multiprocessing as mp

 from detection.run_tf_detector import ImagePathUtils, TFDetector
 import visualization.visualization_utils as viz_utils
@@ -197,7 +197,7 @@ def load_and_run_detector_batch(model_file, image_file_names, checkpoint_path=No
         if len(already_processed) > 0:
             print('Warning: when using multiprocessing, all images are reprocessed')

-        pool = workerpool(n_cores)
+        pool = mp.Pool(n_cores)

         image_batches = list(chunks_by_number_of_chunks(image_file_names, n_cores))
         results = pool.map(partial(process_images, tf_detector=tf_detector,
@@ -372,4 +372,5 @@ def main():

 if __name__ == '__main__':
+    mp.set_start_method('spawn', force=True)
     main()

(Comment originally posted by persts)

agentmorris commented 1 year ago

I'm on ubuntu 20.04 indeed @agentmorris , python 3.6.12 and tensorflow 1.15.0. I've tested it on batches of 10 to 30 images. Actually it doesn't surprise me, I already had problems using tensorflow in multiple thread. Thank you for your solution @persts I'll definitely try it.

(Comment originally posted by plrevolve)

agentmorris commented 1 year ago

Ok sadly your solution didn't make it. But It looks like Tensorflow use all my cores by default so eventually it's not such a big deal. Capture d’écran de 2020-08-20 11-47-19

(Comment originally posted by plrevolve)

agentmorris commented 1 year ago

Just want to report I'm seeing the same thing on Ubuntu 18 with 8 Gb of RAM. When I switch to single core, all cores are still used for a single image so I'm not sure if more cores would help.

(Comment originally posted by rbavery)