cds-astro / mocpy

Python library to easily create and manipulate MOCs (Multi-Order Coverage maps)
https://cds-astro.github.io/mocpy/
BSD 3-Clause "New" or "Revised" License
59 stars 33 forks source link

mocpy probability calculation when using multiprocessing #132

Closed mcoughlin closed 3 months ago

mcoughlin commented 4 months ago

We have been putting moc. probability_in_multiordermap(skymap)

within a parallel loop using multiprocessing to parallelize across mocs, and see errors like the below:

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/hildafs/home/oconnorb/.conda/envs/gwemopt/lib/python3.11/site-packages/mocpy/moc/moc.py”, line 771, in probability_in_multiordermap return mocpy.multiorder_probdens_map_sum_in_smoc( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: MOC at index ‘3’ not found Exception ignored in: <function AbstractMOC.del at 0x14a65c27f9c0> Traceback (most recent call last): File “/hildafs/home/oconnorb/.conda/envs/gwemopt/lib/python3.11/site-packages/mocpy/abstract_moc.py”, line 24, in del mocpy.drop(self.store_index) OSError: MOC at index ‘3’ not found Exception ignored in: <function AbstractMOC.del at 0x14a65c27f9c0> Traceback (most recent call last): File “/hildafs/home/oconnorb/.conda/envs/gwemopt/lib/python3.11/site-packages/mocpy/abstract_moc.py”, line 24, in del mocpy.drop(self.store_index) OSError: MOC at index ‘2’ not found

Do you have a sense why this might be? It seems deep within the package, so a bit hard to tell what might be the problem.

ManonMarchand commented 4 months ago

Ah yes. Sorry it's a duplicate of #102. We have a vague idea about how to solve this. I'll have a try in the next days. Thanks for reporting

fxpineau commented 4 months ago

Yes, this is why we put in place a ref_count in the Rust storage and a copy method in the MOCPy Rust source. But so far we have been lacking pickle (called behind the scenes when using multi-threading in Python) knowledge to know where to call the copy method...

fxpineau commented 4 months ago

@mcoughlin 1 - Just ins case: are you using the same MOC in more than 255 threads? (The reference count is an unsigned byte, so limited to 255) 2 - Would you provide us with a minimal repoductible example?

mcoughlin commented 4 months ago
  1. Nope!
  2. How about:
from mocpy import MOC
from pathos.multiprocessing import ProcessPool
pool = ProcessPool()
results = pool.map(lambda moc: MOC.from_string("0/0"), [j for j in range(32)], chunksize=16)
ManonMarchand commented 4 months ago

Thanks! Looks like multiprocessing calls pickle behind the scene, the bug is narrowed down in issue https://github.com/cds-astro/mocpy/issues/133 We're getting closer

(traceback for the snippet above)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/manon.marchand/.conda/envs/mocpy-dev/lib/python3.11/site-packages/mocpy/abstract_moc.py", line 20, in __repr__
    return self.to_string(format="ascii", fold=80)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/manon.marchand/.conda/envs/mocpy-dev/lib/python3.11/site-packages/mocpy/abstract_moc.py", line 589, in to_string
    return mocpy.to_ascii_str_with_fold(self.store_index, fold)[:-1]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: MOC at index '0' not found
ManonMarchand commented 4 months ago

@fxpineau :

If we add a deep copy, this example now raises: Reason: 'OSError('Unable to copy MOC: 255 copies already reached')

ManonMarchand commented 4 months ago

@mcoughlin we merged a fix to the pickle issue. It works well with your minimal example with the multiprocessing library too. If you confirm that your more complex use case is also solved, we'll do a release. Thanks again for reporting the issue

mcoughlin commented 4 months ago

@ManonMarchand I now see:

Exception ignored in: <function AbstractMOC.del at 0x308c39e40> Traceback (most recent call last): File "/Users/mcoughlin/miniconda3/envs/gwemopt-rust/lib/python3.11/site-packages/mocpy/abstract_moc.py", line 24, in del mocpy.drop(self.store_index) ^^^^^^^^^^^^^^^^ AttributeError: 'MOC' object has no attribute 'store_index'

mcoughlin commented 4 months ago

@ManonMarchand I actually see both:

Exception ignored in: <function AbstractMOC.del at 0x308839e40> Traceback (most recent call last): File "/Users/mcoughlin/miniconda3/envs/gwemopt-rust/lib/python3.11/site-packages/mocpy/abstract_moc.py", line 24, in del mocpy.drop(self.store_index) ^^^^^^^^^^^^^^^^ AttributeError: 'MOC' object has no attribute 'store_index' MOC creation: 0%| | 32/45713 [00:02<1:10:46, 10.76it/s] joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/Users/mcoughlin/miniconda3/envs/gwemopt-rust/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 661, in wait_result_broken_or_wakeup result_item = result_reader.recv() ^^^^^^^^^^^^^^^^^^^^ File "/Users/mcoughlin/miniconda3/envs/gwemopt-rust/lib/python3.11/multiprocessing/connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mcoughlin/miniconda3/envs/gwemopt-rust/lib/python3.11/site-packages/mocpy/abstract_moc.py", line 57, in setstate mocpy.copy(state["store_index"]) OSError: MOC at index '1' not found

ManonMarchand commented 4 months ago

Ok thanks. Still looking then ^^'

mcoughlin commented 4 months ago

@ManonMarchand Any chance we can do a release before this is completely solved? I see that pypi won't let me do a release just pinning main:

Can't have direct dependency: mocpy@ git+https://github.com/cds-astro/mocpy

fxpineau commented 4 months ago

@mcoughlin Manon is going to answer with details (TL;DR: the cleaner way to support multi-threading is also the slowest one due to pickle and/or multiple python instances loading their own Rust MOC library). Would you like us to add a static method in MOC taking an array of MOCs and a MultiOrderMap so we can multi-thread at the Rust level instead of at the Python level, without any copy of MOCs data?

mcoughlin commented 4 months ago

@fxpineau multi-threading at the Rust level sounds exactly like the right thing to do. Right now I have been trying to thread also at the MOC creation level, it would also be really cool to be able to pass a bunch of polygons and get a bunch of MOCs back.

fxpineau commented 4 months ago

We are giving a try in branch 'par'...

mcoughlin commented 4 months ago

@fxpineau awesome. For example, that will be a drop-in replacement for the code here: https://github.com/skyportal/gwemopt/blob/main/gwemopt/tiles.py#L1107 where I have the loops or parallelize it. I also have the sense you will (or already do) have something smarter for the MOC generation, which I do here: https://github.com/skyportal/gwemopt/blob/main/gwemopt/tiles.py#L1107

ManonMarchand commented 4 months ago

Hi!

So, for future reference, we discovered that the multiprocessing module of python actually spawns new python instances in which the calculations are done. In terms of what happens on the rust side, it also means that different rust instances coexist.

Our former way of pickling MOCs was to pickle the index of the MOC (which is a pointer). As such, the python side only manipulates an int while the rust part manages the real in-memory MOCs). This approach could not work when different instances of python/rust had to communicate between each other : the int in the python side does not represent a moc and it was the only thing that could be exchanged through pickle.

So we made the choice to store the json serialization of the MOCs in the pickles. This means that now two independent instances of rust/python can communicate through pickles. It also means that we send big objects instead of just an int. Hence our doubts about performances.

We also were blocked by a garbage collector issue. When executing your minimal reproducible example, the list of mocs in the results variable has to be explicitly erased with del results when not used anymore. Otherwise, the python's garbage collector cleans the rust part of the mocpy library before the mocs (and thus has no was of knowing how to delete them). This can be prevented when using the pool in a with context as seems to be the most common way of doing.

This change of the pickle behavior is already merged in master.

tboch commented 4 months ago

@mcoughlin Manon is going to answer with details (TL;DR: the cleaner way to support multi-threading is also the slowest one due to pickle and/or multiple python instances loading their own Rust MOC library). Would you like us to add a static method in MOC taking an array of MOCs and a MultiOrderMap so we can multi-thread at the Rust level instead of at the Python level, without any copy of MOCs data?

Well, Python users of mocpy would expect the library to work in multiprocessing context, thus this is not the best solution to me.

ManonMarchand commented 4 months ago

@tboch : We'll go with both. The multiprocessing seems fixed by the switch in pickling recipe, and on top of this, some methods can be added to manage multi-threading on the rust side when they are especially critical.

(thus the switch from bug to feature request: the bug seems fixed, the features are incomming)

tboch commented 4 months ago

@tboch : We'll go with both. The multiprocessing seems fixed by the switch in pickling recipe, and on top of this, some methods can be added to manage multi-threading on the rust side when they are especially critical.

(thus the switch from bug to feature request: the bug seems fixed, the features are incomming)

ok, I misunderstood, sorry, my bad.

ManonMarchand commented 3 months ago

The new features are added. We'll do a release today.

mcoughlin commented 3 months ago

Thanks @ManonMarchand and @fxpineau for all your work.