czbiohub-sf / shrimPy

shrimPy: Smart High-throughput Robust Imaging & Measurement in Python
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

Slurmkit scripts slowing down the I/O #111

Open edyoshikun opened 10 months ago

edyoshikun commented 10 months ago

Problem

Running scripts like the stabilization and cropping were causing a slowdown in the I/O for other users @talonchandler @ieivanov

After some debugging, I believe I've nailed down the issue. This issue can be replicated in two ways with datasets that have lots of positions and timepoints and/or channels.

  1. Most slurm scripts parallize over T and C and submit individual jobs for each positions. Keep in mind that since we parallelize over these two dimensions each multiprocessing pool will spawn n number of subprocesses. So, if all positions are allocated, then each jobs will spawn its respective number of child processes causing lots of I/O calls.
  2. Make a slurm script with more CPUs, memory and simultaneous number of subprocesses. If we do this we run into the same issue as we will have multiple jobs running in parallel and I/O calls which reduce our overall throughput.

Proposed solutions:

for i in range(0, len(input_position_dirpaths), batch_size): chunk_input_paths = input_position_dirpaths[i : i + batch_size] if i == 0: register_jobs = [ submit_function( register_func, slurm_params=params, input_data_path=in_path, output_path=out_path, ) for in_path in chunk_input_paths ] else: register_jobs = [ submit_function( register_func, slurm_params=params, input_data_path=in_path, output_path=out_path, dependencies=register_jobs, ) for in_path in chunk_input_paths

    ]