DataONEorg / hashstore

HashStore, a hash-based object store for DataONE data packages
Apache License 2.0
1 stars 1 forks source link

Add multiprocessing process locks & shared list #98

Closed doulikecookiedough closed 1 month ago

doulikecookiedough commented 1 month ago

In Python, there are two paths to achieve concurrency/parallelism: the threading module and the multiprocessing module. Currently, only a threading lock is used when synchronizing the storing and deleting of objects, which is completely bypassed when using multiprocessing.

To Do:

Example Code:


reference_lock_mp = multiprocessing.Lock()
reference_locked_cids_mp = multiprocessing.Manager().list()  # Create a shared list

use_multiprocessing = os.getenv("USE_MULTIPROCESSING", "False") == "True"
    if use_multiprocessing:
        while cid in self.reference_locked_cids_mp:
            ...
doulikecookiedough commented 1 month ago

This has been completed via https://github.com/DataONEorg/hashstore/pull/103.

Documented how to switch HashStore from threading synchronization to multiprocessing. There may be a more elegant way to do this, but I feel this is good enough at this time.