SWIFTSIM / swiftgalaxy

Load in particles of a simulated galaxy, rotate coordinates, easy spherical/cylindrical coordinates, access integrated properties, and more.
GNU General Public License v3.0
2 stars 1 forks source link

Pickling for swiftgalaxy.reader._SWIFTParticleDatasetHelper #4

Closed edoaltamura closed 11 months ago

edoaltamura commented 1 year ago

When parallelising operations using SWIFTGalaxy instances and multiprocessing I run into this pickling error

PicklingError: Can't pickle <class 'swiftgalaxy.reader.GasDatasetHelper'>: attribute lookup GasDatasetHelper on swiftgalaxy.reader failed

As soon as multiprocessing attempts to communicate the task via pickle, the state of the instance is undefined, causing the code to crash. To allow the pickling/unpickling of the SWIFTGalaxy, it would be necessary to implement the following methods:

def __getstate__(self):
    # Define the pickling state of the object
    # Return a dictionary of the object's attributes to be pickled

def __setstate__(self, state):
    # Restore the object's state from the pickled state dictionary
    # Assign the attributes from the state dictionary to the object

Would it be possible to add these to the class definitions? Thanks in advance for the support.

kyleaoman commented 1 year ago

@edoaltamura thanks for raising the issue. A couple of questions:

kyleaoman commented 1 year ago

Could also check whether something more flexible than pickle, such as dill, would work. There's a fork of multiprocessing implemented with dill called multiprocess that could facilitate this.

kyleaoman commented 11 months ago

@edoaltamura is this still an issue for you?

edoaltamura commented 11 months ago

I found that using prefer="threads" in joblib avoids the issue with pickling. This is a minimal example:

from typing import Any, List
from joblib import Parallel, delayed

def process_galaxy_method(galaxy: SWIFTGalaxy) -> Any:
   ...

galaxy_objects_list: List[SWIFTGalaxy] = [...]

results = Parallel(n_jobs=-1, prefer="threads")(
   delayed(process_galaxy_method)(galaxy) for galaxy in galaxy_objects_list
)

# Now unpack the results list for further use
kyleaoman commented 11 months ago

Ok. I think that implementing serialization would be very labour intensive and currently for no identified benefit, so closing this as unplanned.