MouseLand / cellpose

a generalist algorithm for cellular segmentation with human-in-the-loop capabilities
https://www.cellpose.org/
BSD 3-Clause "New" or "Revised" License
1.33k stars 382 forks source link

[INSTALL] Mac M1-M3 support Cellpose v3 #886

Closed erikgerdtsson closed 2 weeks ago

erikgerdtsson commented 6 months ago

I was not able to get the GPU to work for v3 (GUI) on my macbook pro M3 .

python -m cellpose --gpu_device mps --use_gpu

Following instructions on Cellpose documentation for M1 mac with reference to Peter Soboleski's branch (Cellpose v2) with GPU support which seems to rely on PyTorch. That seem to work better.

https://github.com/psobolewskiPhD/cellpose/tree/feature/add_MPS_device

conda create --name cellpose-dev python=3.9 -y conda activate cellpose-dev conda install napari git clone https://github.com/psobolewskiPhD/cellpose.git cd cellpose git fetch git switch feature/add_MPS_device conda install imagecodecs -y pip install -e.

python -m cellpose

Are there any plans to release a mac M1-M3 version of v3?

668

Thanks for an amazing tool.

carsen-stringer commented 6 months ago

I'm sorry, for some reason Apple has not made torch support for double computations, like Nvidia has. We need to test whether there is a loss in performance with single, like in @psobolewskiPhD version of the code, before we can merge in those changes. Thank you Peter for your efforts on this

erikgerdtsson commented 6 months ago

I see, thank you.

psobolewskiPhD commented 6 months ago

I'll try to revisit that and make a PR if there is still something special about my branch.

LauraBreimann commented 5 months ago

I would also be really keen for either an updated version from Peter or an integration into v3 in general!

I saw significant speed improvements using the GPU, but the "cyto3" model performs better on my cells. I tried to load the cyto3 model as a pretrained model into v2 (from Peters branch) but couldn't get that to work somehow.

Thank you all!

OratHelm commented 2 months ago

@LauraBreimann, @erikgerdtsson I made a fork of Cellpose 3.0.10 which allows you to use the GPU or CPU with the GUI on Apple Silicon.

From what I've tested, it works well, both from the GUI and from a Python script. And it was fairly easy to set up, so if anyone can bring it cleanly into the main branch that would be great. Normally, what I've done in my fork shouldn't affect other configurations (see changes).

However, I couldn't get the training to work, unfortunately. It does use the GPU, there are no errors but the generated model doesn't find any cells, and when using the GUI it shows: [INFO] 0, train_loss=1.4174, test_loss=0.0000, LR=0.0000, time 0.60s [INFO] 5, train_loss=nan, test_loss=0.0000, LR=0.0556, time 1.25s [INFO] 10, train_loss=nan, test_loss=0.0000, LR=0.1000, time 1.76s and so on. And using a Python script: [INFO] 0, train_loss=nan, test_loss=0.0000, LR=0.0000, time 0.51s [INFO] 5, train_loss=nan, test_loss=0.0000, LR=0.0556, time 1.11s [INFO] 10, train_loss=nan, test_loss=0.0000, LR=0.1000, time 1.62s

Unfortunately, I can't figure out where the problem is coming from. If anyone has any ideas, that would be great.

carsen-stringer commented 2 months ago

Thank you this looks great! Indeed torch mps now supports double so all the inference should work - just curious what the speed up is compared to the CPU?

Happy to merge this in the near future, even without training support but would be really nice to have that. It looks like one step in the network is perhaps not implemented for autograd with mps: https://github.com/pytorch/pytorch/issues?q=is%3Aissue+mps+autograd+nan. But we don't have any uncommon steps in the network so not sure which would fail

OratHelm commented 2 months ago

For the following tests, I used an M2 with 8 CPU cores and 8 GPU cores, which I think is the smallest GPU proposed by Apple. On a few images, segmentation with cyto3 was 36% faster and denoise approx 7 times faster on the GPU compared to the CPU. And I think the results would be even more interesting for training...

carsen-stringer commented 1 month ago

The training is working fine for me on an M3 with Python 3.11 and pytorch 2.4.0!!

carsen-stringer commented 1 month ago

@OratHelm could you please open a pull request with your fork?

OratHelm commented 1 month ago

It works for me too, with Python 3.9.19 and Pytorch 2.4.0! 🥳 I just opened the pull request (#1003)

carsen-stringer commented 1 month ago

Amazing thanks!

Makinadefuego commented 2 weeks ago

I had same problem and I´ve tried everything, TY.

Always 0.0 ACC

carsen-stringer commented 2 weeks ago

You mean the training loss is zero? These updates from @OratHelm are in v3.0.11

cstrlln commented 2 weeks ago

You mean the training loss is zero? These updates from @OratHelm are in v3.0.11

@carsen-stringer that's great, where can I find v3.0.11, I noticed the github version is v3.0.10.