cloudpipe / cloudpickle

Extended pickling support for Python objects
Other
1.64k stars 167 forks source link

Cannot ignore locks with cloudpickle #447

Open ryanthompson591 opened 3 years ago

ryanthompson591 commented 3 years ago

I have an object that I would like to pickle. Unfortunately, the object has a lock within it.

Is there a way to set up cloudpickle to ignore locks so that it doesn't crash?

import threading
import cloudpickle

class ThirdPartyClass:
    def __init__(self):
        self.internal_lock = threading.RLock()
        self.something_I_want_pickled = 'important string'

cloudpickle.dumps(ThirdPartyClass())

See also #81

ryanthompson591 commented 3 years ago

Here is a workaround. I still suggest some sort of API.

from _thread import RLock as RLockType

def _pickle_rlock(obj):
    return _create_rlock, tuple([])

def _create_rlock():
    return RLockType()

def pickle_with_rlock_supported(obj):
    with io.BytesIO() as file:
        pickler = cloudpickle.CloudPickler(file)
        pickler.dispatch_table[RLockType] = _pickle_rlock
        pickler.dump(obj)
        return file.getvalue()
ogrisel commented 2 years ago

Do we agree that ThirdPartyClass is not picklable either with the pickle module of the standard library?

Why would cloudpickle behave any differently in this case?

I think it's the responsibility of the ThirdPartyClass author to implement __reduce__ or __reduce_ex__ method to make this class picklable by making the handling of lock at pickling/unpickling time explicit.

A generic, silent workaround implemented in cloudpickle could break the intended protection provided by the lock and lead to silent data corruption and other extremely hard to debug problems depending on the application.

ogrisel commented 2 years ago

To restate the scope and objective of cloudpickle: the goal of cloudpickle is to make it possible to pickle interactively defined Python functions or instances of interactively defined classes, typically to make it possible to use distributed clusters running several Python processes in parallel from interactive development environment (e.g. jupyter notebook or simple Python script with code defined in the __main__ module).

Pickling instances of arbitrary classes of third-party libraries that are not picklable by default for various reasons is not a goal of cloudpickle.