.all vs .cpuAndNeuralEngine?

pj4533 commented 1 year ago

I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?

l2gakuen commented 1 year ago

Hey, In my tests on M1max, AllComputeUnits balanced the charge between GPU and ANE, not scaling it, it's slightly slower than GPU only. gens with GPUonly used 90% GPU gens with AllComp used 60% GPU with 0.25step/sec(?) less .

Just sharing my opinion, nothing official :D

msiracusa commented 1 year ago

I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?

Your assumption is correct. Specifying .all lets Core ML use any combination of compute units {CPU, GPU, Neural Engine} to optimize prediction latency. The fact that the more restricted .cpuAndNeuralEngine option is yielding better results than .all is a known issue on this set of models and certain systems.

pj4533 commented 1 year ago

I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?

Your assumption is correct. Specifying .all lets Core ML use any combination of compute units {CPU, GPU, Neural Engine} to optimize prediction latency. The fact that the more restricted .cpuAndNeuralEngine option is yielding better results than .all is a known issue on this set of models and certain systems.

great! thanks for the reply.

pj4533 commented 1 year ago

I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?

Your assumption is correct. Specifying .all lets Core ML use any combination of compute units {CPU, GPU, Neural Engine} to optimize prediction latency. The fact that the more restricted .cpuAndNeuralEngine option is yielding better results than .all is a known issue on this set of models and certain systems.

Just wanted to reopen this, just as a question...lemme know if there is a better forum.

I noticed when using .cpuAndNeuralEngine that occasionally I get very bad performance. For example, normally I get about 1.9 step/s on my MacMini M1, however, sometimes that drops to about 0.08 step/s.

Some notes:

Typically, it will go back up once it finishes a given image (on my tests, usually 26 steps at ~0.5 strength, using image2image), but not always
Usually it happens when I have left my system running, and I come back and unlock it using TouchID? (maybe a connection to the NeuralEngine there?)

Anyway, I'm happy to open a separate issue if you think its worth it, but figured I'd ask the question first since its possible I am just not understanding something!

apple / ml-stable-diffusion

.all vs .cpuAndNeuralEngine? #122