desihub / desisim

DESI simulations
BSD 3-Clause "New" or "Revised" License
16 stars 22 forks source link

Make multiprocessing Pool safe in quickquasars #568

Closed andreicuceu closed 2 years ago

andreicuceu commented 2 years ago

The best practice for using multiprocessing.Pool is within a "with as" statement to safely terminate the pool.

We (with Naim) found a bug when nproc > nfiles. This is the error message at the end of running quickquasars:

Exception ignored in: <function Pool.del at 0x1555439bcf70> Traceback (most recent call last): File "/global/homes/a/acuceu/.conda/envs/desienv/lib/python3.9/multiprocessing/pool.py", line 268, in del self._change_notifier.put(None) File "/global/homes/a/acuceu/.conda/envs/desienv/lib/python3.9/multiprocessing/queues.py", line 378, in put self._writer.send_bytes(obj) File "/global/homes/a/acuceu/.conda/envs/desienv/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes self._send_bytes(m[offset:offset + size]) File "/global/homes/a/acuceu/.conda/envs/desienv/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes self._send(header + buf) File "/global/homes/a/acuceu/.conda/envs/desienv/lib/python3.9/multiprocessing/connection.py", line 373, in _send n = write(self._handle, buf) OSError: [Errno 9] Bad file descriptor

This goes away when using "with multiprocessing.Pool(args.nproc) as pool:" to clean the pool after it's used.

andreufont commented 2 years ago

@andreicuceu - why is this not merged? @alxogm ?

andreicuceu commented 2 years ago

@andreicuceu - why is this not merged? @alxogm ?

Not sure. The coverage test failed for some reason, and the output isn't available anymore. Can you rerun the automatic tests?

alxogm commented 2 years ago

@andreicuceu - why is this not merged? @alxogm ?

It seems I missed this PR. I'm happy to merge if was already tested.

alxogm commented 2 years ago

@andreicuceu it seems the automatic tests can not be re-run at this stage, but you can make small push to trigger them again. Alternatively we can include your change in Hiram's new PR.

andreicuceu commented 2 years ago

@alxogm I added a small comment, so this should trigger another run of the automatic tests, but it's awaiting approval.

andreicuceu commented 2 years ago

@sbailey @alxogm It looks to me like the problem appears because I did the pull request back when the main branch was still called "master" whereas now it's called "main". Not sure what to do. I could delete this pull request and start a new one on the most current version with the same change. It's just two lines of code after all. Have any pull requests been pushed since the change? What did you do for them?

alxogm commented 2 years ago

Hmm, I think making a fresh PR will be the easiest for you, we have not made recent PRs, so I haven't had that issue before ...

sbailey commented 2 years ago

Yes, let's do a new PR off of current main. Thanks.