Open Proryanator opened 4 months ago
@Proryanator, thank you for interest. Unfortunately, we don't. It would be great to have the first Apple silicon implementation of SHMT you are targeting at.
@jk78346
https://machinelearning.apple.com/research/neural-engine-transformers
I was reading on how CoreML handles various ML workloads, it sounds analagous to SHMT, especially:
"Core ML then seamlessly blends CPU, GPU, and ANE (if available) to create the most effective hybrid execution plan exploiting all available engines on a given device."
I would need to test this out of course to see what workloads excersize both the GPU/NPU. Reading this doc here it is possible when converting models to CoreML to have it target one specific computing unit, good to know (may be useful for improved optimizations later on).
This graphic here from that article is really nice to be able to see what parts of the system were used when running a CoreML workload:
Will share any findings I make!
This is great work! The speed implications are insane.
I'd be interested in making an SHMT framework implementation for apple silicon! Other than this project here for the hardware specified, any active development being done for apple silicon SHMT that you know of?
The non-Nvidia portion should run with docker, but definitely won't run on iOS w/o extra work.