Closed edoaltamura closed 11 months ago
@edoaltamura thanks for raising the issue. A couple of questions:
__getstate__
or __setstate__
in the swiftsimio.reader.SWIFTDataset
class (or swiftsimio.reader.__SWIFTParticleDataset
, etc.). Since the swiftgalaxy
classes either inherit from or wrap around these, I think that we would first need support on the swiftsimio
side. If possible, could you try replacing the SWIFTGalaxy
in your code with a SWIFTDataset
instance and see whether that works? If that succeeds then I'll look at supporting serialising a SWIFTGalaxy
, if it fails you should first raise an issue on swiftsimio
(referencing this one).multiprocessing
? This way if I try to implement something I can test whether it achieves what you need.Could also check whether something more flexible than pickle, such as dill, would work. There's a fork of multiprocessing implemented with dill called multiprocess that could facilitate this.
@edoaltamura is this still an issue for you?
I found that using prefer="threads"
in joblib
avoids the issue with pickling. This is a minimal example:
from typing import Any, List
from joblib import Parallel, delayed
def process_galaxy_method(galaxy: SWIFTGalaxy) -> Any:
...
galaxy_objects_list: List[SWIFTGalaxy] = [...]
results = Parallel(n_jobs=-1, prefer="threads")(
delayed(process_galaxy_method)(galaxy) for galaxy in galaxy_objects_list
)
# Now unpack the results list for further use
Ok. I think that implementing serialization would be very labour intensive and currently for no identified benefit, so closing this as unplanned.
When parallelising operations using
SWIFTGalaxy
instances andmultiprocessing
I run into this pickling errorAs soon as
multiprocessing
attempts to communicate the task viapickle
, the state of the instance is undefined, causing the code to crash. To allow the pickling/unpickling of theSWIFTGalaxy
, it would be necessary to implement the following methods:Would it be possible to add these to the class definitions? Thanks in advance for the support.