escalab / SHMT

SHMT for MICRO 2023
62 stars 6 forks source link

Apple Silicon/iOS Implementation #5

Open Proryanator opened 4 months ago

Proryanator commented 4 months ago

This is great work! The speed implications are insane.

I'd be interested in making an SHMT framework implementation for apple silicon! Other than this project here for the hardware specified, any active development being done for apple silicon SHMT that you know of?

The non-Nvidia portion should run with docker, but definitely won't run on iOS w/o extra work.

jk78346 commented 4 months ago

@Proryanator, thank you for interest. Unfortunately, we don't. It would be great to have the first Apple silicon implementation of SHMT you are targeting at.

Proryanator commented 4 months ago

@jk78346

https://machinelearning.apple.com/research/neural-engine-transformers

I was reading on how CoreML handles various ML workloads, it sounds analagous to SHMT, especially:

"Core ML then seamlessly blends CPU, GPU, and ANE (if available) to create the most effective hybrid execution plan exploiting all available engines on a given device."

I would need to test this out of course to see what workloads excersize both the GPU/NPU. Reading this doc here it is possible when converting models to CoreML to have it target one specific computing unit, good to know (may be useful for improved optimizations later on).

This graphic here from that article is really nice to be able to see what parts of the system were used when running a CoreML workload:

Screenshot 2024-03-25 at 1 14 42 PM

Will share any findings I make!