DLR-RM / BlenderProc

A procedural Blender pipeline for photorealistic training image generation
GNU General Public License v3.0
2.6k stars 432 forks source link

BrokenPipeError during BOP dataset generation #1084

Closed matteomastrogiuseppe closed 1 month ago

matteomastrogiuseppe commented 1 month ago

Describe the issue

Hi,

I get a BrokenPipeError: [Errno 32] Broken pipe during the creation of a BOP-formatted dataset. Running on Ubuntu 22.04 through WSL. The main script I'm running is very very similar to your example:

blenderpoc run examples/datasets/bop_challenge/main_<bop_dataset_name>_<random/upright>.py 
               <path_to_bop_data> 
               resources/cctextures 
               examples/datasets/bop_challenge/output
               --num_scenes=2000
Error log: 3/25|10:55:26: Calculating GT masks - /.../blenderproc/output/dataset/train_pbr/000001, 0 3/25|10:55:29: Loading gt info from existing chunk dir - /.../blenderproc/output/dataset/train_pbr/000001 3/25|10:55:29: Calculating GT info - /.../blenderproc/output/dataset/train_pbr/000001, 0 Process ForkPoolWorker-3882: Process ForkPoolWorker-3900: Traceback (most recent call last): File "/home/simoc/blender/blender-3.5.1-linux-x64/3.5/python/lib/python3.10/multiprocessing/pool.py", line 131, in worker put((job, i, result)) File "/home/simoc/blender/blender-3.5.1-linux-x64/3.5/python/lib/python3.10/multiprocessing/queues.py", line 377, in put self._writer.send_bytes(obj) File "/home/simoc/blender/blender-3.5.1-linux-x64/3.5/python/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/home/simoc/blender/blender-3.5.1-linux-x64/3.5/python/lib/python3.10/multiprocessing/connection.py", line 411, in _send_bytes self._send(header + buf) File "/home/simoc/blender/blender-3.5.1-linux-x64/3.5/python/lib/python3.10/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe

The error occurs after the generation of ~150/200 scenes, with 10 poses for each scene. What I can observe is that the RAM memory progressively fills up, as well as the Swp memory, and then the error is issued.

Minimal code example

No response

Files required to run the code

No response

Expected behavior

My first impressions are that some processes/threads/files are being created and not properly closed.

I took a look a look at the implementation of blenderproc.writer.BopWriterUtility.py and it seems that indeed the pools are not closed and joined explicitely: https://github.com/DLR-RM/BlenderProc/blob/9b50ba080c5e4d6ceae832a74bbd9bdc1d66ff4f/blenderproc/python/writer/BopWriterUtility.py#L170-L182

This is not recommended. From the library documentation:

[!CAUTION] Warning multiprocessing.pool objects have internal resources that need to be properly managed (like any other resource) by using the pool as a context manager or by calling close() and terminate() manually. Failure to do this can lead to the process hanging on finalization. Note that it is not correct to rely on the garbage collector to destroy the pool as CPython does not assure that the finalizer of the pool will be called (see object.del() for more information).

This is probably the reason why the machine RAM starts to fill up indefinitely during execution. Two simple lines of code at the end of the method fix the issue (for me):

 pool.close()
 pool.join() 

Opening a PR, let me know if you any extra info (unfortunately it's kinda hard to replicate this issue without letting it run for a long time)

BlenderProc version

2.7.0