Closed erikgerdtsson closed 2 weeks ago
I'm sorry, for some reason Apple has not made torch support for double computations, like Nvidia has. We need to test whether there is a loss in performance with single, like in @psobolewskiPhD version of the code, before we can merge in those changes. Thank you Peter for your efforts on this
I see, thank you.
I'll try to revisit that and make a PR if there is still something special about my branch.
I would also be really keen for either an updated version from Peter or an integration into v3 in general!
I saw significant speed improvements using the GPU, but the "cyto3" model performs better on my cells. I tried to load the cyto3 model as a pretrained model into v2 (from Peters branch) but couldn't get that to work somehow.
Thank you all!
@LauraBreimann, @erikgerdtsson I made a fork of Cellpose 3.0.10 which allows you to use the GPU or CPU with the GUI on Apple Silicon.
From what I've tested, it works well, both from the GUI and from a Python script. And it was fairly easy to set up, so if anyone can bring it cleanly into the main branch that would be great. Normally, what I've done in my fork shouldn't affect other configurations (see changes).
However, I couldn't get the training to work, unfortunately. It does use the GPU, there are no errors but the generated model doesn't find any cells, and when using the GUI it shows:
[INFO] 0, train_loss=1.4174, test_loss=0.0000, LR=0.0000, time 0.60s
[INFO] 5, train_loss=nan, test_loss=0.0000, LR=0.0556, time 1.25s
[INFO] 10, train_loss=nan, test_loss=0.0000, LR=0.1000, time 1.76s
and so on. And using a Python script:
[INFO] 0, train_loss=nan, test_loss=0.0000, LR=0.0000, time 0.51s
[INFO] 5, train_loss=nan, test_loss=0.0000, LR=0.0556, time 1.11s
[INFO] 10, train_loss=nan, test_loss=0.0000, LR=0.1000, time 1.62s
Unfortunately, I can't figure out where the problem is coming from. If anyone has any ideas, that would be great.
Thank you this looks great! Indeed torch mps now supports double so all the inference should work - just curious what the speed up is compared to the CPU?
Happy to merge this in the near future, even without training support but would be really nice to have that. It looks like one step in the network is perhaps not implemented for autograd with mps: https://github.com/pytorch/pytorch/issues?q=is%3Aissue+mps+autograd+nan. But we don't have any uncommon steps in the network so not sure which would fail
For the following tests, I used an M2 with 8 CPU cores and 8 GPU cores, which I think is the smallest GPU proposed by Apple. On a few images, segmentation with cyto3 was 36% faster and denoise approx 7 times faster on the GPU compared to the CPU. And I think the results would be even more interesting for training...
The training is working fine for me on an M3 with Python 3.11 and pytorch 2.4.0!!
@OratHelm could you please open a pull request with your fork?
It works for me too, with Python 3.9.19 and Pytorch 2.4.0! 🥳 I just opened the pull request (#1003)
Amazing thanks!
I had same problem and I´ve tried everything, TY.
Always 0.0 ACC
You mean the training loss is zero? These updates from @OratHelm are in v3.0.11
You mean the training loss is zero? These updates from @OratHelm are in v3.0.11
@carsen-stringer that's great, where can I find v3.0.11, I noticed the github version is v3.0.10.
I was not able to get the GPU to work for v3 (GUI) on my macbook pro M3 .
python -m cellpose --gpu_device mps --use_gpu
Following instructions on Cellpose documentation for M1 mac with reference to Peter Soboleski's branch (Cellpose v2) with GPU support which seems to rely on PyTorch. That seem to work better.
https://github.com/psobolewskiPhD/cellpose/tree/feature/add_MPS_device
conda create --name cellpose-dev python=3.9 -y
conda activate cellpose-dev
conda install napari
git clone https://github.com/psobolewskiPhD/cellpose.git
cd cellpose
git fetch
git switch feature/add_MPS_device
conda install imagecodecs -y
pip install -e.
python -m cellpose
Are there any plans to release a mac M1-M3 version of v3?
668
Thanks for an amazing tool.