CellProfiler / CellProfiler-plugins

Community-contributed and experimental CellProfiler modules.
http://plugins.cellprofiler.org/
54 stars 65 forks source link

RunCellpose_Issues with GPU memory share setting #236

Open sugan89 opened 4 months ago

sugan89 commented 4 months ago

RunCellpose plugin works well in a Python environment when the GPU memory share for each worker option is set to 1 but when the option is set to 0.1, I get the following error,


** TORCH CUDA version installed and working. **
>>>> using GPU
>>>> model diam_mean =  30.000 (ROIs rescaled to this size during training)
>>>> model diam_labels =  34.352 (mean diameter of training ROIs)
Unable to create masks. Check your module settings. CUDA out of memory. Tried to allocate 98.00 MiB. GPU 0 has a total capacity of 4.00 GiB of which 2.86 GiB is free. Of the allocated memory 254.49 MiB is allocated by PyTorch, and 97.51 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Failed to run module RunCellpose
Traceback (most recent call last):
  File "C:\Users\ssivagur\Anaconda3\envs\CP_plugins\lib\site-packages\cellprofiler\gui\pipelinecontroller.py", line 3390, in do_step
    self.__pipeline.run_module(module, workspace_model)
  File "C:\Users\ssivagur\Anaconda3\envs\CP_plugins\lib\site-packages\cellprofiler_core\pipeline\_pipeline.py", line 1349, in run_module
    module.run(workspace)
  File "C:\Users\ssivagur\Documents\GitHub\CellProfiler-plugins\active_plugins\runcellpose.py", line 606, in run
    y.segmented = y_data
UnboundLocalError: local variable 'y_data' referenced before assignment```
bethac07 commented 4 months ago

Looking at the torch documentation, the function we use to do the memory sunsetting CLAIMS it works off fraction of the total memory, so the allocation should fit in 10%. Have not yet checked to see if this is a known bug

ShataDg commented 2 months ago

Might be related - https://forum.image.sc/t/cellprofiler-plugins-cellpose-stardist-gpu-memory-in-test-mode/95938

imagesc-bot commented 1 month ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/runcellpose-error/86328/13

ErinWeisbart commented 1 month ago

FYI, I just got the same UnboundLocalError: local variable 'y_data' referenced before assignment from line 606. I have Use GPU set to No in my pipeline. Running in Docker erinweisbart/distributed-cellprofiler:2.0.0_4.2.4_cellpose

EDIT: after updating my plugins on the Docker with a fresh git pull the error goes away!