Open AndreaChiChengdu opened 1 year ago
As a supplement, I used diffusers app to inference SD1.5 using ANE on my iPhone15 pro(A17Pro), and I found that E2E times were about the same as M2. And the A17Pro has very limited improvement over the A16 17TOPs ANE(with the same mem bandwith)。Is it because of memory bound?
I am very confused. How can I find the answer to this question?
https://www.cpu-monkey.com/en/compare_cpu-apple_a17_pro-vs-apple_m2_pro_12_cpu_19_gpu I think you might consider about their cores.
https://www.cpu-monkey.com/en/compare_cpu-apple_a17_pro-vs-apple_m2_pro_12_cpu_19_gpu I think you might consider about their cores.
unet runs on ANE, as can be seen from the specifications. Both the A17Pro and M2 ANE have 16 cores, but the A17Pro is much more powerful, 35T VS 15.8T, but the performance is worse. It's incredible. any suggestions? @TimYao18 @pcuenca
You cannot just see the "ANE" part. The compute Unit is "CPU and NE". Maybe the CPU part add M2 score. Or just Apple got screwed.
When using CPU + ANE, CPU will also use a lot power.
You cannot just see the "ANE" part. The compute Unit is "CPU and NE". Maybe the CPU part add M2 score. Or just Apple got screwed.
When using CPU + ANE, CPU will also use a lot power.
The CPU will always have power consumption, that's not the point.
I encourage you to use the instrument coreml template for further analysis, you will see that almost all the unet operators(99.89%) are executed on ANE. The cpu has only a very small amount of workload. It is also very small compared to the latency of ANE computation.
Our view from a more microscopic decomposition point of view is that the time of unet ANE computation is already slightly slower than M2. anyway,thanks for your reply,buddy, have a great weekend~
Thank you for your information.
I met similar problem on M2 Pro and M2 that M2 Pro runs slower than M2, and when using computeUnit==All will run twice slower than CPU_AND_NE. Maybe I can use this to check if M2 Pro has something wrong that it runs slower than M2 when it runs through unet.
Hello, as the title indicates and snapshot from the benchmark of stable diffusion xl in this project, it can be seen that the performance of A17Pro 35T ANE is worse than M2 ANE 15.8T. Is there any other reason besides the large memory bandwith gap?
It seems that the A17Pro 35T's high computing power is not being used very effectively at all.