Large memory usage when running through CLI

WeilerP commented 1 year ago

@carsen-stringer, is there a guideline on the expected memory usage? I was following this guide from 10x to process a Xenium sample but I keep running out of memory when running cellpose with the CLI. So far, I requested up to 800GB of RAM but still ran out of memory.

To extract and save the stack level of interest, I am using

import tifffile

def get_tif_image(fpath: str, stack_level: int):
    with tifffile.TiffFile(fpath) as tif:
        image = tif.series[0].levels[stack_level].asarray()

    return image

def write_tif_image(image, fpath: str):
    tifffile.imwrite(
        fpath,
        image,
        photometric='minisblack',
        dtype='uint16',
        tile=(1024, 1024),
        compression='JPEG_2000_LOSSY',
        metadata={'axes': 'ZYX'},
    )

which gives me an image of size (12, 10208, 7814). I then run cellpose via the CLI using

python -m cellpose --dir ${output_dir} --pretrained_model nuclei --chan 0 --chan2 0 --img_filter _morphology.ome --diameter ${pixel_diameter} --do_3D --save_tif --use_gpu --verbose

Is there something inherently wrong with this pipeline that would explain the large memory usage?

When I am running everything through a Python script, I can successfully run the segmentation with a peak memory usage of less than 150GB.

import tifffile

from cellpose import io, models

def get_image(file_path):
    return tifffile.imread(file_path)

if __name__ == "__main__":
    file_path = DATA_DIR / "melanoma" / "region_4" / f"level_{STACK_LEVEL}" / f"level_{STACK_LEVEL}_morphology.ome.tif"
    image = get_image(file_path)

    model = models.Cellpose(model_type="nuclei", gpu=True)

    channels = [0, 0]
    masks, flows, _, diameters = model.eval(image, channels=channels, diameter=10, do_3D=True, progress=True)

    io.masks_flows_to_seg(images=image, masks=masks, flows=flows, diams=diameters, file_names=file_path, channels=channels)
    io.save_masks(images=image, masks=masks, flows=flows, file_names=file_path, png=False, tif=True, channels=channels)

I am running everything in Python 3.9 with

tifffile==2023.4.12
cellpose==2.2.2
torch==2.0.1

All packages and versions

``` # Name Version Build Channel _libgcc_mutex 0.1 main _openmp_mutex 5.1 1_gnu anyio 3.7.0 pypi_0 pypi argon2-cffi 21.3.0 pypi_0 pypi argon2-cffi-bindings 21.2.0 pypi_0 pypi arrow 1.2.3 pypi_0 pypi asttokens 2.2.1 pypi_0 pypi async-lru 2.0.2 pypi_0 pypi attrs 23.1.0 pypi_0 pypi babel 2.12.1 pypi_0 pypi backcall 0.2.0 pypi_0 pypi beautifulsoup4 4.12.2 pypi_0 pypi bleach 6.0.0 pypi_0 pypi ca-certificates 2023.05.30 h06a4308_0 cellpose 2.2.2 pypi_0 pypi certifi 2023.5.7 pypi_0 pypi cffi 1.15.1 pypi_0 pypi charset-normalizer 3.1.0 pypi_0 pypi cmake 3.26.4 pypi_0 pypi comm 0.1.3 pypi_0 pypi debugpy 1.6.7 pypi_0 pypi decorator 5.1.1 pypi_0 pypi defusedxml 0.7.1 pypi_0 pypi exceptiongroup 1.1.1 pypi_0 pypi executing 1.2.0 pypi_0 pypi fastjsonschema 2.17.1 pypi_0 pypi fastremap 1.13.5 pypi_0 pypi filelock 3.12.2 pypi_0 pypi fqdn 1.5.1 pypi_0 pypi idna 3.4 pypi_0 pypi imagecodecs 2023.3.16 pypi_0 pypi importlib-metadata 6.7.0 pypi_0 pypi ipykernel 6.23.2 pypi_0 pypi ipython 8.14.0 pypi_0 pypi ipywidgets 8.0.6 pypi_0 pypi isoduration 20.11.0 pypi_0 pypi jedi 0.18.2 pypi_0 pypi jinja2 3.1.2 pypi_0 pypi json5 0.9.14 pypi_0 pypi jsonpointer 2.4 pypi_0 pypi jsonschema 4.17.3 pypi_0 pypi jupyter-client 8.2.0 pypi_0 pypi jupyter-core 5.3.1 pypi_0 pypi jupyter-events 0.6.3 pypi_0 pypi jupyter-lsp 2.2.0 pypi_0 pypi jupyter-server 2.6.0 pypi_0 pypi jupyter-server-terminals 0.4.4 pypi_0 pypi jupyterlab 4.0.2 pypi_0 pypi jupyterlab-pygments 0.2.2 pypi_0 pypi jupyterlab-server 2.23.0 pypi_0 pypi jupyterlab-widgets 3.0.7 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1 libffi 3.4.4 h6a678d5_0 libgcc-ng 11.2.0 h1234567_1 libgomp 11.2.0 h1234567_1 libstdcxx-ng 11.2.0 h1234567_1 lit 16.0.6 pypi_0 pypi llvmlite 0.40.1rc1 pypi_0 pypi markupsafe 2.1.3 pypi_0 pypi matplotlib-inline 0.1.6 pypi_0 pypi mistune 2.0.5 pypi_0 pypi mpmath 1.3.0 pypi_0 pypi natsort 8.3.1 pypi_0 pypi nbclient 0.8.0 pypi_0 pypi nbconvert 7.5.0 pypi_0 pypi nbformat 5.9.0 pypi_0 pypi ncurses 6.4 h6a678d5_0 nest-asyncio 1.5.6 pypi_0 pypi networkx 3.1 pypi_0 pypi notebook-shim 0.2.3 pypi_0 pypi numba 0.57.0 pypi_0 pypi numpy 1.24.3 pypi_0 pypi nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi nvidia-cuda-cupti-cu11 11.7.101 pypi_0 pypi nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi nvidia-cufft-cu11 10.9.0.58 pypi_0 pypi nvidia-curand-cu11 10.2.10.91 pypi_0 pypi nvidia-cusolver-cu11 11.4.0.1 pypi_0 pypi nvidia-cusparse-cu11 11.7.4.91 pypi_0 pypi nvidia-nccl-cu11 2.14.3 pypi_0 pypi nvidia-nvtx-cu11 11.7.91 pypi_0 pypi opencv-python-headless 4.7.0.72 pypi_0 pypi openssl 3.0.8 h7f8727e_0 overrides 7.3.1 pypi_0 pypi packaging 23.1 pypi_0 pypi pandocfilters 1.5.0 pypi_0 pypi parso 0.8.3 pypi_0 pypi pexpect 4.8.0 pypi_0 pypi pickleshare 0.7.5 pypi_0 pypi pip 23.1.2 py39h06a4308_0 platformdirs 3.6.0 pypi_0 pypi prometheus-client 0.17.0 pypi_0 pypi prompt-toolkit 3.0.38 pypi_0 pypi psutil 5.9.5 pypi_0 pypi ptyprocess 0.7.0 pypi_0 pypi pure-eval 0.2.2 pypi_0 pypi pycparser 2.21 pypi_0 pypi pygments 2.15.1 pypi_0 pypi pyrsistent 0.19.3 pypi_0 pypi python 3.9.16 h955ad1f_3 python-dateutil 2.8.2 pypi_0 pypi python-json-logger 2.0.7 pypi_0 pypi pyyaml 6.0 pypi_0 pypi pyzmq 25.1.0 pypi_0 pypi readline 8.2 h5eee18b_0 requests 2.31.0 pypi_0 pypi rfc3339-validator 0.1.4 pypi_0 pypi rfc3986-validator 0.1.1 pypi_0 pypi roifile 2023.5.12 pypi_0 pypi scipy 1.10.1 pypi_0 pypi send2trash 1.8.2 pypi_0 pypi setuptools 67.8.0 py39h06a4308_0 six 1.16.0 pypi_0 pypi sniffio 1.3.0 pypi_0 pypi soupsieve 2.4.1 pypi_0 pypi spalma 0.0.0 pypi_0 pypi sqlite 3.41.2 h5eee18b_0 stack-data 0.6.2 pypi_0 pypi sympy 1.12 pypi_0 pypi terminado 0.17.1 pypi_0 pypi tifffile 2023.4.12 pypi_0 pypi tinycss2 1.2.1 pypi_0 pypi tk 8.6.12 h1ccaba5_0 tomli 2.0.1 pypi_0 pypi torch 2.0.1 pypi_0 pypi tornado 6.3.2 pypi_0 pypi tqdm 4.65.0 pypi_0 pypi traitlets 5.9.0 pypi_0 pypi triton 2.0.0 pypi_0 pypi typing-extensions 4.6.3 pypi_0 pypi tzdata 2023c h04d1e81_0 uri-template 1.2.0 pypi_0 pypi urllib3 2.0.3 pypi_0 pypi wcwidth 0.2.6 pypi_0 pypi webcolors 1.13 pypi_0 pypi webencodings 0.5.1 pypi_0 pypi websocket-client 1.6.0 pypi_0 pypi wheel 0.38.4 py39h06a4308_0 widgetsnbextension 4.0.7 pypi_0 pypi xz 5.4.2 h5eee18b_0 zipp 3.15.0 pypi_0 pypi zlib 1.2.13 h5eee18b_0 ```

Thanks in advance for your help/input and let me know if you need/want any other information!

natelharrison commented 1 year ago

I am also running into a similar issue where cellpose is using in excess of 500GB of ram on a relatively small image.

Here is the batch script I am using:

#!/bin/sh
#SBATCH --qos=generic_qos
#SBATCH --gres=gpu:1
#SBATCH --partition=generic_partition
#SBATCH --account=generic_account
#SBATCH --nodes=1
#SBATCH --time=24:00:00
#SBATCH --ntasks=20
#SBATCH --mem=500G
#SBATCH --output=/path/to/log/cellpose_run.log
#SBATCH --export=ALL

### Run your command
. /path/to/anaconda3/etc/profile.d/conda.sh
conda activate cellpose

dir="/path/to/raw_images/processed"
image_path="/path/to/rotated_cropped_data/image.tif" 
save_path="/path/to/trained_model_outputs/256_default"
model_path="/path/to/rotated_cropped_data/256_crops/models/model_name"
model="model_name"

python -m cellpose --dir $dir --pretrained_model $model --savedir $save_path --add_model $model_path --do_3D --no_npy --save_tif --verbose --use_gpu

Run log

```python 2023-07-09 17:50:34,098 [INFO] WRITING LOG OUTPUT TO /global/home/users/user/.cellpose/run.log 2023-07-09 17:50:34,099 [INFO] cellpose version: 2.2 platform: linux python version: 3.10.11 torch version: 1.12.0 2023-07-09 17:50:36,882 [INFO] ** TORCH CUDA version installed and working. ** 2023-07-09 17:50:36,882 [INFO] >>>> using GPU 2023-07-09 17:50:36,894 [INFO] >>>> running cellpose on 8 images using chan_to_seg GRAY and chan (opt) NONE 2023-07-09 17:50:36,894 [INFO] >> cellpose_residual_on_style_on_concatenation_off_256_crops_2023_07_08_21_25_09.996033 << model set to be used 2023-07-09 17:50:37,150 [INFO] >>>> model diam_mean = 30.000 (ROIs rescaled to this size during training) 2023-07-09 17:50:37,151 [INFO] >>>> model diam_labels = 40.970 (mean diameter of training ROIs) 2023-07-09 17:50:37,151 [INFO] >>>> using diameter 30.000 for all images 2023-07-09 17:50:37,157 [INFO] 0%| | 0/8 [00:00

Packages

```python # packages in environment at /global/home/users/natelharrison/anaconda3/envs/cellpose: # # Name Version Build Channel _libgcc_mutex 0.1 main _openmp_mutex 5.1 1_gnu absl-py 1.4.0 pypi_0 pypi array-record 0.2.0 pypi_0 pypi astunparse 1.6.3 pypi_0 pypi blas 1.0 mkl bzip2 1.0.8 h7b6447c_0 ca-certificates 2023.05.30 h06a4308_0 cachetools 5.3.0 pypi_0 pypi cellpose 2.2 pypi_0 pypi certifi 2022.12.7 pypi_0 pypi charset-normalizer 3.1.0 pypi_0 pypi click 8.1.3 pypi_0 pypi cmake 3.26.3 pypi_0 pypi cuda 11.6.1 0 nvidia cuda-cccl 11.6.55 hf6102b2_0 nvidia cuda-command-line-tools 11.6.2 0 nvidia cuda-compiler 11.6.2 0 nvidia cuda-cudart 11.6.55 he381448_0 nvidia cuda-cudart-dev 11.6.55 h42ad0f4_0 nvidia cuda-cuobjdump 11.6.124 h2eeebcb_0 nvidia cuda-cupti 11.6.124 h86345e5_0 nvidia cuda-cuxxfilt 11.6.124 hecbf4f6_0 nvidia cuda-driver-dev 11.6.55 0 nvidia cuda-gdb 12.1.105 0 nvidia cuda-libraries 11.6.1 0 nvidia cuda-libraries-dev 11.6.1 0 nvidia cuda-memcheck 11.8.86 0 nvidia cuda-nsight 12.1.105 0 nvidia cuda-nsight-compute 12.1.1 0 nvidia cuda-nvcc 11.6.124 hbba6d2d_0 nvidia cuda-nvdisasm 12.1.105 0 nvidia cuda-nvml-dev 11.6.55 haa9ef22_0 nvidia cuda-nvprof 12.1.105 0 nvidia cuda-nvprune 11.6.124 he22ec0a_0 nvidia cuda-nvrtc 11.6.124 h020bade_0 nvidia cuda-nvrtc-dev 11.6.124 h249d397_0 nvidia cuda-nvtx 11.6.124 h0630a44_0 nvidia cuda-nvvp 12.1.105 0 nvidia cuda-runtime 11.6.1 0 nvidia cuda-samples 11.6.101 h8efea70_0 nvidia cuda-sanitizer-api 12.1.105 0 nvidia cuda-toolkit 11.6.1 0 nvidia cuda-tools 11.6.1 0 nvidia cuda-visual-tools 11.6.1 0 nvidia cudatoolkit 11.3.1 h2bc3f7f_2 dm-tree 0.1.8 pypi_0 pypi etils 1.2.0 pypi_0 pypi fastremap 1.13.4 pypi_0 pypi filelock 3.12.0 pypi_0 pypi flatbuffers 23.3.3 pypi_0 pypi gast 0.4.0 pypi_0 pypi gds-tools 1.6.1.9 0 nvidia google-api-core 2.11.0 pypi_0 pypi google-auth 2.17.3 pypi_0 pypi google-auth-oauthlib 1.0.0 pypi_0 pypi google-cloud-core 2.3.2 pypi_0 pypi google-cloud-storage 2.8.0 pypi_0 pypi google-crc32c 1.5.0 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi google-resumable-media 2.4.1 pypi_0 pypi googleapis-common-protos 1.59.0 pypi_0 pypi grpcio 1.54.0 pypi_0 pypi h5py 3.8.0 pypi_0 pypi idna 3.4 pypi_0 pypi imagecodecs 2023.3.16 pypi_0 pypi importlib-resources 5.12.0 pypi_0 pypi intel-openmp 2023.1.0 hdb19cb5_46305 jax 0.4.8 pypi_0 pypi keras 2.12.0 pypi_0 pypi keras-cv 0.4.2 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1 libclang 16.0.0 pypi_0 pypi libcublas 11.9.2.110 h5e84587_0 nvidia libcublas-dev 11.9.2.110 h5c901ab_0 nvidia libcufft 10.7.1.112 hf425ae0_0 nvidia libcufft-dev 10.7.1.112 ha5ce4c0_0 nvidia libcufile 1.6.1.9 0 nvidia libcufile-dev 1.6.1.9 0 nvidia libcurand 10.3.2.106 0 nvidia libcurand-dev 10.3.2.106 0 nvidia libcusolver 11.3.4.124 h33c3c4e_0 nvidia libcusparse 11.7.2.124 h7538f96_0 nvidia libcusparse-dev 11.7.2.124 hbbe9722_0 nvidia libffi 3.4.4 h6a678d5_0 libgcc-ng 11.2.0 h1234567_1 libgomp 11.2.0 h1234567_1 libnpp 11.6.3.124 hd2722f0_0 nvidia libnpp-dev 11.6.3.124 h3c42840_0 nvidia libnvjpeg 11.6.2.124 hd473ad6_0 nvidia libnvjpeg-dev 11.6.2.124 hb5906b9_0 nvidia libstdcxx-ng 11.2.0 h1234567_1 libuuid 1.41.5 h5eee18b_0 lit 16.0.1 pypi_0 pypi llvmlite 0.39.1 pypi_0 pypi markdown 3.4.3 pypi_0 pypi markupsafe 2.1.1 py310h7f8727e_0 mkl 2023.1.0 h6d00ec8_46342 ml-dtypes 0.1.0 pypi_0 pypi mpmath 1.3.0 pypi_0 pypi natsort 8.3.1 pypi_0 pypi ncurses 6.4 h6a678d5_0 networkx 3.1 pypi_0 pypi nsight-compute 2023.1.1.4 0 nvidia numba 0.56.4 pypi_0 pypi numpy 1.23.5 pypi_0 pypi nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi nvidia-cuda-cupti-cu11 11.7.101 pypi_0 pypi nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi nvidia-cufft-cu11 10.9.0.58 pypi_0 pypi nvidia-curand-cu11 10.2.10.91 pypi_0 pypi nvidia-cusolver-cu11 11.4.0.1 pypi_0 pypi nvidia-cusparse-cu11 11.7.4.91 pypi_0 pypi nvidia-nccl-cu11 2.14.3 pypi_0 pypi nvidia-nvtx-cu11 11.7.91 pypi_0 pypi oauthlib 3.2.2 pypi_0 pypi opencv-python-headless 4.7.0.72 pypi_0 pypi openssl 3.0.9 h7f8727e_0 opt-einsum 3.3.0 pypi_0 pypi packaging 23.1 pypi_0 pypi pip 23.1.2 py310h06a4308_0 promise 2.3 pypi_0 pypi protobuf 4.22.3 pypi_0 pypi psutil 5.9.5 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pygments 2.15.1 pypi_0 pypi pyqt5 5.15.9 pypi_0 pypi pyqt5-qt5 5.15.2 pypi_0 pypi pyqt5-sip 12.12.0 pypi_0 pypi pyqtgraph 0.13.3 pypi_0 pypi python 3.10.11 h955ad1f_3 pytorch 1.12.0 py3.10_cuda11.3_cudnn8.3.2_0 pytorch pytorch-cuda 11.6 h867d48c_1 pytorch pytorch-mutex 1.0 cuda pytorch qtpy 2.3.1 pypi_0 pypi readline 8.2 h5eee18b_0 regex 2023.3.23 pypi_0 pypi requests 2.28.2 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rsa 4.9 pypi_0 pypi scipy 1.10.1 pypi_0 pypi setuptools 67.8.0 py310h06a4308_0 six 1.16.0 pypi_0 pypi sqlite 3.41.2 h5eee18b_0 superqt 0.4.1 pypi_0 pypi tbb 2021.8.0 hdb19cb5_0 tensorboard 2.12.2 pypi_0 pypi tensorboard-data-server 0.7.0 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorflow 2.12.0 pypi_0 pypi tensorflow-datasets 4.9.2 pypi_0 pypi tensorflow-estimator 2.12.0 pypi_0 pypi tensorflow-io-gcs-filesystem 0.32.0 pypi_0 pypi tensorflow-metadata 1.13.1 pypi_0 pypi termcolor 2.2.0 pypi_0 pypi tifffile 2023.4.12 pypi_0 pypi tk 8.6.12 h1ccaba5_0 toml 0.10.2 pypi_0 pypi torch 2.0.0 pypi_0 pypi tqdm 4.65.0 pypi_0 pypi triton 2.0.0 pypi_0 pypi typing-extensions 4.5.0 pypi_0 pypi typing_extensions 4.6.3 py310h06a4308_0 tzdata 2023c h04d1e81_0 urllib3 1.26.15 pypi_0 pypi werkzeug 2.2.3 pypi_0 pypi wheel 0.38.4 py310h06a4308_0 wrapt 1.14.1 pypi_0 pypi xz 5.4.2 h5eee18b_0 zipp 3.15.0 pypi_0 pypi zlib 1.2.13 h5eee18b_0 ```

mrariden commented 1 year ago

When I am running everything through a Python script, I can successfully run the segmentation with a peak memory usage of less than 150GB.

This is interesting and suggests some kind of memory bug in the CLI version. I'll look into this

parkjosh-broadinstitute commented 1 year ago

Hello, has anyone been able to solve this issue? I am running into the same issues trying to run Cellpose on our Xenium data.

Tagging for visibility: @WeilerP @Myrkgod @mrariden

natelharrison commented 1 year ago

Hello, has anyone been able to solve this issue? I am running into the same issues trying to run Cellpose on our Xenium data.

Tagging for visibility: @WeilerP @Myrkgod @mrariden

If you only need the masks you can try running cellpose through a python script and save the mask output of model.eval() using tifffile.imwrite(). That's what I am doing here but I also have some extra stuff for tiling the image and running predictions in parallel, though I haven't implemented restitching the segmentations together. It's also pretty much the same as the code @WeilerP is using but use tifffile.imwrite() instead of io.save_masks.

natelharrison commented 1 year ago

@mrariden I was also wrong about the discrepancy between CLI and GUI memory use, I'm not sure what was the reason for the decreased use in the GUI at the time. Though, I do find it weird that using save_masks takes up so much memory when the entire mask should be a few GBs in size. Also is there any downside to using tifffile to save the masks if they are all I need?

mrariden commented 1 year ago

@parkjosh-broadinstitute, @WeilerP For the moment:

I've not been able to identify the issue with the OOM and it might be on the interface with bash/SLURM. My recommendation for now is to run cellpose via a python script, since that seems to be working in most cases (mine included).

If you really want to use the CLI, then you can manually tile your images with an overlap and save the flows. The scripts I have are not general enough to share but you can stitch the flows together, divide out the overlapping regions, and run dynamics.compute_masks() on the full size, stitched flows. This gets around the memory error by saving tiles to disk instead of holding them in memory.

@Myrkgod the .npy file currently holds data for the entire image itself plus the masks, flows, and outlines. In an upcoming PR, we're removing the image data to speed up saving the .npy file to disk. There's no issue with using tifffile to save masks if that solves your problem.

natelharrison commented 1 year ago

If you really want to use the CLI, then you can manually tile your images with an overlap and save the flows. The scripts I have are not general enough to share but you can stitch the flows together, divide out the overlapping regions, and run dynamics.compute_masks() on the full size, stitched flows. This gets around the memory error by saving tiles to disk instead of holding them in memory.

@mrariden Sorry this is unrelated to the original issue, but do you know if this will perform the same as running the model on smaller crops? My image seems to perform better on smaller crops due to a range of cell sizes and brightness so I'd like to split them and run it separately.

Edit: I've not tried this, but from my experience with Omnipose I am going to say yes, it will perform the same or similar.

mrariden commented 11 months ago

@WeilerP can you checkout the solution here?

From my research, the memory issue is related to the matplotlib figure creation, which isn't usually needed as an output. I've removed it and saw much better memory usage and run times. I think this will be how we implement the solution.

mrariden commented 10 months ago

This should be resolved with the latest merge.

MouseLand / cellpose

Large memory usage when running through CLI #735