Large memory usage when running through CLI #735

WeilerP commented 1 year ago

@carsen-stringer, is there a guideline on the expected memory usage? I was following this guide from 10x to process a Xenium sample but I keep running out of memory when running cellpose with the CLI. So far, I requested up to 800GB of RAM but still ran out of memory.

To extract and save the stack level of interest, I am using

import tifffile

def get_tif_image(fpath: str, stack_level: int):
    with tifffile.TiffFile(fpath) as tif:
        image = tif.series[0].levels[stack_level].asarray()

    return image

def write_tif_image(image, fpath: str):
        tile=(1024, 1024),
        metadata={'axes': 'ZYX'},

which gives me an image of size (12, 10208, 7814). I then run cellpose via the CLI using

python -m cellpose --dir ${output_dir} --pretrained_model nuclei --chan 0 --chan2 0 --img_filter _morphology.ome --diameter ${pixel_diameter} --do_3D --save_tif --use_gpu --verbose

Is there something inherently wrong with this pipeline that would explain the large memory usage?

When I am running everything through a Python script, I can successfully run the segmentation with a peak memory usage of less than 150GB.

import tifffile

from cellpose import io, models

def get_image(file_path):
    return tifffile.imread(file_path)

if __name__ == "__main__":
    file_path = DATA_DIR / "melanoma" / "region_4" / f"level_{STACK_LEVEL}" / f"level_{STACK_LEVEL}_morphology.ome.tif"
    image = get_image(file_path)

    model = models.Cellpose(model_type="nuclei", gpu=True)

    channels = [0, 0]
    masks, flows, _, diameters = model.eval(image, channels=channels, diameter=10, do_3D=True, progress=True)

    io.masks_flows_to_seg(images=image, masks=masks, flows=flows, diams=diameters, file_names=file_path, channels=channels)
    io.save_masks(images=image, masks=masks, flows=flows, file_names=file_path, png=False, tif=True, channels=channels)

I am running everything in Python 3.9 with

Thanks in advance for your help/input and let me know if you need/want any other information!

natelharrison commented 1 year ago

I am also running into a similar issue where cellpose is using in excess of 500GB of ram on a relatively small image.

Here is the batch script I am using:

#SBATCH --qos=generic_qos
#SBATCH --gres=gpu:1
#SBATCH --partition=generic_partition
#SBATCH --account=generic_account
#SBATCH --nodes=1
#SBATCH --time=24:00:00
#SBATCH --ntasks=20
#SBATCH --mem=500G
#SBATCH --output=/path/to/log/cellpose_run.log
#SBATCH --export=ALL

### Run your command
. /path/to/anaconda3/etc/profile.d/
conda activate cellpose


python -m cellpose --dir $dir --pretrained_model $model --savedir $save_path --add_model $model_path --do_3D --no_npy --save_tif --verbose --use_gpu
Run log ```python 2023-07-09 17:50:34,098 [INFO] WRITING LOG OUTPUT TO /global/home/users/user/.cellpose/run.log 2023-07-09 17:50:34,099 [INFO] cellpose version: 2.2 platform: linux python version: 3.10.11 torch version: 1.12.0 2023-07-09 17:50:36,882 [INFO] ** TORCH CUDA version installed and working. ** 2023-07-09 17:50:36,882 [INFO] >>>> using GPU 2023-07-09 17:50:36,894 [INFO] >>>> running cellpose on 8 images using chan_to_seg GRAY and chan (opt) NONE 2023-07-09 17:50:36,894 [INFO] >> cellpose_residual_on_style_on_concatenation_off_256_crops_2023_07_08_21_25_09.996033 << model set to be used 2023-07-09 17:50:37,150 [INFO] >>>> model diam_mean = 30.000 (ROIs rescaled to this size during training) 2023-07-09 17:50:37,151 [INFO] >>>> model diam_labels = 40.970 (mean diameter of training ROIs) 2023-07-09 17:50:37,151 [INFO] >>>> using diameter 30.000 for all images 2023-07-09 17:50:37,157 [INFO] 0%| | 0/8 [00:00
mrariden commented 1 year ago

This is interesting and suggests some kind of memory bug in the CLI version. I'll look into this

parkjosh-broadinstitute commented 1 year ago

Hello, has anyone been able to solve this issue? I am running into the same issues trying to run Cellpose on our Xenium data.

Tagging for visibility: @WeilerP @Myrkgod @mrariden

natelharrison commented 1 year ago

If you only need the masks you can try running cellpose through a python script and save the mask output of model.eval() using tifffile.imwrite(). That's what I am doing here but I also have some extra stuff for tiling the image and running predictions in parallel, though I haven't implemented restitching the segmentations together. It's also pretty much the same as the code @WeilerP is using but use tifffile.imwrite() instead of io.save_masks.

natelharrison commented 1 year ago

@mrariden I was also wrong about the discrepancy between CLI and GUI memory use, I'm not sure what was the reason for the decreased use in the GUI at the time. Though, I do find it weird that using save_masks takes up so much memory when the entire mask should be a few GBs in size. Also is there any downside to using tifffile to save the masks if they are all I need?

mrariden commented 1 year ago

@parkjosh-broadinstitute, @WeilerP For the moment:

I've not been able to identify the issue with the OOM and it might be on the interface with bash/SLURM. My recommendation for now is to run cellpose via a python script, since that seems to be working in most cases (mine included).

If you really want to use the CLI, then you can manually tile your images with an overlap and save the flows. The scripts I have are not general enough to share but you can stitch the flows together, divide out the overlapping regions, and run dynamics.compute_masks() on the full size, stitched flows. This gets around the memory error by saving tiles to disk instead of holding them in memory.

@Myrkgod the .npy file currently holds data for the entire image itself plus the masks, flows, and outlines. In an upcoming PR, we're removing the image data to speed up saving the .npy file to disk. There's no issue with using tifffile to save masks if that solves your problem.

natelharrison commented 1 year ago

If you really want to use the CLI, then you can manually tile your images with an overlap and save the flows. The scripts I have are not general enough to share but you can stitch the flows together, divide out the overlapping regions, and run dynamics.compute_masks() on the full size, stitched flows. This gets around the memory error by saving tiles to disk instead of holding them in memory.

@mrariden Sorry this is unrelated to the original issue, but do you know if this will perform the same as running the model on smaller crops? My image seems to perform better on smaller crops due to a range of cell sizes and brightness so I'd like to split them and run it separately.

Edit: I've not tried this, but from my experience with Omnipose I am going to say yes, it will perform the same or similar.

mrariden commented 11 months ago

@WeilerP can you checkout the solution here?

From my research, the memory issue is related to the matplotlib figure creation, which isn't usually needed as an output. I've removed it and saw much better memory usage and run times. I think this will be how we implement the solution.

mrariden commented 10 months ago

This should be resolved with the latest merge.