Closed cchance27 closed 12 months ago
These are great questions. ANE is faster than MPS. It doesn't seem like much when compared to most CUDA builds, but it is a significant difference - 50-100% faster from my experience. It depends on the machine, and I would happily see other people's results. Speaking of which, I suppose some benchmark data in readme could be useful. I can't remember the exact details, but I think GPU Core ML is marginally faster than MPS, but that could also depend on the machine. CPU+GPU+ANE is supported with ALL option. GPU+ANE - I'm not really sure, but all options require CPU, so perhaps you can't run ANE+GPU without CPU. BTW, in this particular use case, CPU_AND_ANE seems to be much faster than ALL.
I was sitting here thinking and i wonder if the ALL being slightly slower might be a memory bandwidth issue depending on the GPU, as i know for instance my M3 pro has 150 memory bandwidth down form the previous m2 pro, the only ones that get really big are the max cpus
Ya benchmarks in readme would be nice for the various options.
Tested on M3 MBP 32gb with 512x512 SD1.5 model, using the convertor method of rendering from a safe tensor
Getting ~2.55 on ANE with EIPSUM (EIPSUM2 seems the same basically) Getting ~2.32 on GPU with ORIGINAL Getting ~1.48 on ALL with EIPSUM Getting ~2.06 on ALL with ORIGINAL
So ya, ANE is best with EIPSUM.... however I noticed something odd... when I bump resolution to 768x512.... 1.04 ANE with EIPSUM 1.32 GPU with ORIGINAL
So for the smaller image we're seeing ANE at ~10% faster, but when the size is larger suddenly the GPU is 10% faster, it's also overall 2x slower for the 50% larger size
Already clarified will close this off since it was discussed and your working on benchmarks elsewhere #18
Just wondering have you ran any comparisons between running against the MPS (metal) vs using the ANE for inference?