chanzuckerberg / cryoet-data-portal

CryoET Data Portal
MIT License
18 stars 10 forks source link

Feature Request: Make the python API client thread safe #1130

Open uermel opened 1 month ago

uermel commented 1 month ago

It would be helpful for processing data from the API client in parallel to make the python API client thread safe.

For instance, below is a common pattern to do parallel processing in python, but fails at the moment with the below exception:

from cryoet_data_portal import Client, Run
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import Tuple, Any

def get_vs(run: Run) -> Tuple[Run, Any]:
    if len(run.tomogram_voxel_spacings) == 0:
        return run, None
    return run, run.tomogram_voxel_spacings[0]

client = Client()
runs = Run.find(client)

with ThreadPoolExecutor(max_workers=8) as executor:
    futures = []
    for run in runs:
        futures.append(executor.submit(get_vs, run))

    for future in as_completed(futures):
        run, vs = future.result()
        if vs is None:
            print(run.id)

TransportAlreadyConnected: Transport is already connected

uermel commented 1 month ago

cc: @jgadling

andy-sweet commented 1 month ago

As a workaround, you may be able to create a new instance of Client for each task as done in the napari plugin.