Closed mcoughlin closed 3 months ago
Ah yes. Sorry it's a duplicate of #102. We have a vague idea about how to solve this. I'll have a try in the next days. Thanks for reporting
Yes, this is why we put in place a ref_count
in the Rust storage and a copy
method in the MOCPy Rust source.
But so far we have been lacking pickle (called behind the scenes when using multi-threading in Python) knowledge to know where to call the copy
method...
@mcoughlin 1 - Just ins case: are you using the same MOC in more than 255 threads? (The reference count is an unsigned byte, so limited to 255) 2 - Would you provide us with a minimal repoductible example?
from mocpy import MOC
from pathos.multiprocessing import ProcessPool
pool = ProcessPool()
results = pool.map(lambda moc: MOC.from_string("0/0"), [j for j in range(32)], chunksize=16)
Thanks! Looks like multiprocessing calls pickle behind the scene, the bug is narrowed down in issue https://github.com/cds-astro/mocpy/issues/133 We're getting closer
(traceback for the snippet above)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/manon.marchand/.conda/envs/mocpy-dev/lib/python3.11/site-packages/mocpy/abstract_moc.py", line 20, in __repr__
return self.to_string(format="ascii", fold=80)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/manon.marchand/.conda/envs/mocpy-dev/lib/python3.11/site-packages/mocpy/abstract_moc.py", line 589, in to_string
return mocpy.to_ascii_str_with_fold(self.store_index, fold)[:-1]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: MOC at index '0' not found
@fxpineau :
If we add a deep copy, this example now raises: Reason: 'OSError('Unable to copy MOC: 255 copies already reached')
@mcoughlin we merged a fix to the pickle issue. It works well with your minimal example with the multiprocessing library too. If you confirm that your more complex use case is also solved, we'll do a release. Thanks again for reporting the issue
@ManonMarchand I now see:
Exception ignored in: <function AbstractMOC.del at 0x308c39e40> Traceback (most recent call last): File "/Users/mcoughlin/miniconda3/envs/gwemopt-rust/lib/python3.11/site-packages/mocpy/abstract_moc.py", line 24, in del mocpy.drop(self.store_index) ^^^^^^^^^^^^^^^^ AttributeError: 'MOC' object has no attribute 'store_index'
@ManonMarchand I actually see both:
Exception ignored in: <function AbstractMOC.del at 0x308839e40> Traceback (most recent call last): File "/Users/mcoughlin/miniconda3/envs/gwemopt-rust/lib/python3.11/site-packages/mocpy/abstract_moc.py", line 24, in del mocpy.drop(self.store_index) ^^^^^^^^^^^^^^^^ AttributeError: 'MOC' object has no attribute 'store_index' MOC creation: 0%| | 32/45713 [00:02<1:10:46, 10.76it/s] joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/Users/mcoughlin/miniconda3/envs/gwemopt-rust/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 661, in wait_result_broken_or_wakeup result_item = result_reader.recv() ^^^^^^^^^^^^^^^^^^^^ File "/Users/mcoughlin/miniconda3/envs/gwemopt-rust/lib/python3.11/multiprocessing/connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mcoughlin/miniconda3/envs/gwemopt-rust/lib/python3.11/site-packages/mocpy/abstract_moc.py", line 57, in setstate mocpy.copy(state["store_index"]) OSError: MOC at index '1' not found
Ok thanks. Still looking then ^^'
@ManonMarchand Any chance we can do a release before this is completely solved? I see that pypi won't let me do a release just pinning main:
Can't have direct dependency: mocpy@ git+https://github.com/cds-astro/mocpy
@mcoughlin Manon is going to answer with details (TL;DR: the cleaner way to support multi-threading is also the slowest one due to pickle and/or multiple python instances loading their own Rust MOC library). Would you like us to add a static method in MOC taking an array of MOCs and a MultiOrderMap so we can multi-thread at the Rust level instead of at the Python level, without any copy of MOCs data?
@fxpineau multi-threading at the Rust level sounds exactly like the right thing to do. Right now I have been trying to thread also at the MOC creation level, it would also be really cool to be able to pass a bunch of polygons and get a bunch of MOCs back.
We are giving a try in branch 'par'...
@fxpineau awesome. For example, that will be a drop-in replacement for the code here: https://github.com/skyportal/gwemopt/blob/main/gwemopt/tiles.py#L1107 where I have the loops or parallelize it. I also have the sense you will (or already do) have something smarter for the MOC generation, which I do here: https://github.com/skyportal/gwemopt/blob/main/gwemopt/tiles.py#L1107
Hi!
So, for future reference, we discovered that the multiprocessing module of python actually spawns new python instances in which the calculations are done. In terms of what happens on the rust side, it also means that different rust instances coexist.
Our former way of pickling MOCs was to pickle the index of the MOC (which is a pointer). As such, the python side only manipulates an int while the rust part manages the real in-memory MOCs). This approach could not work when different instances of python/rust had to communicate between each other : the int in the python side does not represent a moc and it was the only thing that could be exchanged through pickle.
So we made the choice to store the json serialization of the MOCs in the pickles. This means that now two independent instances of rust/python can communicate through pickles. It also means that we send big objects instead of just an int. Hence our doubts about performances.
We also were blocked by a garbage collector issue. When executing your minimal reproducible example, the list of mocs in the results
variable has to be explicitly erased with del results
when not used anymore. Otherwise, the python's garbage collector cleans the rust part of the mocpy library before the mocs (and thus has no was of knowing how to delete them).
This can be prevented when using the pool in a with
context as seems to be the most common way of doing.
This change of the pickle behavior is already merged in master.
@mcoughlin Manon is going to answer with details (TL;DR: the cleaner way to support multi-threading is also the slowest one due to pickle and/or multiple python instances loading their own Rust MOC library). Would you like us to add a static method in MOC taking an array of MOCs and a MultiOrderMap so we can multi-thread at the Rust level instead of at the Python level, without any copy of MOCs data?
Well, Python users of mocpy would expect the library to work in multiprocessing context, thus this is not the best solution to me.
@tboch : We'll go with both. The multiprocessing seems fixed by the switch in pickling recipe, and on top of this, some methods can be added to manage multi-threading on the rust side when they are especially critical.
(thus the switch from bug to feature request: the bug seems fixed, the features are incomming)
@tboch : We'll go with both. The multiprocessing seems fixed by the switch in pickling recipe, and on top of this, some methods can be added to manage multi-threading on the rust side when they are especially critical.
(thus the switch from bug to feature request: the bug seems fixed, the features are incomming)
ok, I misunderstood, sorry, my bad.
The new features are added. We'll do a release today.
Thanks @ManonMarchand and @fxpineau for all your work.
We have been putting moc. probability_in_multiordermap(skymap)
within a parallel loop using multiprocessing to parallelize across mocs, and see errors like the below:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/hildafs/home/oconnorb/.conda/envs/gwemopt/lib/python3.11/site-packages/mocpy/moc/moc.py”, line 771, in probability_in_multiordermap return mocpy.multiorder_probdens_map_sum_in_smoc( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: MOC at index ‘3’ not found Exception ignored in: <function AbstractMOC.del at 0x14a65c27f9c0> Traceback (most recent call last): File “/hildafs/home/oconnorb/.conda/envs/gwemopt/lib/python3.11/site-packages/mocpy/abstract_moc.py”, line 24, in del mocpy.drop(self.store_index) OSError: MOC at index ‘3’ not found Exception ignored in: <function AbstractMOC.del at 0x14a65c27f9c0> Traceback (most recent call last): File “/hildafs/home/oconnorb/.conda/envs/gwemopt/lib/python3.11/site-packages/mocpy/abstract_moc.py”, line 24, in del mocpy.drop(self.store_index) OSError: MOC at index ‘2’ not found
Do you have a sense why this might be? It seems deep within the package, so a bit hard to tell what might be the problem.