Open pj4533 opened 1 year ago
Hey, In my tests on M1max, AllComputeUnits balanced the charge between GPU and ANE, not scaling it, it's slightly slower than GPU only. gens with GPUonly used 90% GPU gens with AllComp used 60% GPU with 0.25step/sec(?) less .
Just sharing my opinion, nothing official :D
I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?
Your assumption is correct. Specifying .all
lets Core ML use any combination of compute units {CPU, GPU, Neural Engine} to optimize prediction latency. The fact that the more restricted .cpuAndNeuralEngine
option is yielding better results than .all
is a known issue on this set of models and certain systems.
I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?
Your assumption is correct. Specifying
.all
lets Core ML use any combination of compute units {CPU, GPU, Neural Engine} to optimize prediction latency. The fact that the more restricted.cpuAndNeuralEngine
option is yielding better results than.all
is a known issue on this set of models and certain systems.
great! thanks for the reply.
I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?
Your assumption is correct. Specifying
.all
lets Core ML use any combination of compute units {CPU, GPU, Neural Engine} to optimize prediction latency. The fact that the more restricted.cpuAndNeuralEngine
option is yielding better results than.all
is a known issue on this set of models and certain systems.
Just wanted to reopen this, just as a question...lemme know if there is a better forum.
I noticed when using .cpuAndNeuralEngine
that occasionally I get very bad performance. For example, normally I get about 1.9 step/s on my MacMini M1, however, sometimes that drops to about 0.08 step/s.
Some notes:
Anyway, I'm happy to open a separate issue if you think its worth it, but figured I'd ask the question first since its possible I am just not understanding something!
I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?