Closed stared closed 1 year ago
Thanks for the contribution and this looks great! Let's work towards that the demo can work efficiently on both NVIDIA GPUs/CPUs/MacOS.
@haotian-liu Thank you for these kind words! And in the first place, for this wonderful model.
So, I added device autodetection, with priorities CUDA > MPS > CPU. So, the CPU fallback is only when GPUs are not available.
A very callback estimate on my Macbook Pro M1 (2021) (benchmark stats here), a rough estimate is:
For comparison, on NVIDIA T4 (with xformers
, without triton), I got 27s.
Hi @stared Thanks for the great work!
Several things:
@haotian-liu I made sure it works on my laptop. Sadly, I have no other Apple Silicon laptops I can test on. Normally I would use GH Actions to test that, but in this case, I am not sure if it is doable.
In any case, I have Macbook Pro with 32GB RAM. As you know, this RAM is shared with GPU, so it may matter a lot for the model performance. I have heard that when memory is at 80% or more, the performance falls drastically.
I use mamba as it is a drop-in replacement for conda, and is much faster with solving dependencies. At the time I started using mamba, it had better support arm64. In fact, the regular conda still has a disclaimer:
Apple silicon builds are experimental and haven't had testing like the other platforms
See more at https://github.com/conda-forge/miniforge. That said, I guess it should work with regular miniforge.
When it comes to versions - I bumped it up. As a side remark, I had to upgrade Pytorch as 1.12.1 didn't support atten:index.Tensor on MPS. I was happy to see that the progress is fast, and that the current version supports this operation.
Thank you for the explanation! I am merging this pull request, and thank you again for the contribution!
I wanted to add the MPS backend (M1/M2 Apple Silicon GPU) and regular CPUs. Performance-wise, these implementations won't be anywhere near CUDA + xformers. Yet, I think it is nice to add it for the sake of compatibility, to make it possible to run on any device, even if slowly.