Open qdrddr opened 7 months ago
This is about running LLMs locally on Apple Silicone. Core ML is a framework that can redistribute workload across CPU, GPU & Nural Engine (ANE). ANE is available on all modern Apple Devices: iPhones & Macs (A14 or newer and M1 or newer). Ideally, we want to run LLMs on ANE only as it has optimizations for running ML tasks compared to GPU. Apple claims "deploying your Transformer models on Apple devices with an A14 or newer and M1 or newer chip to achieve up to 10 times faster and 14 times lower peak memory consumption compared to baseline implementations".
https://machinelearning.apple.com/research/neural-engine-transformers
Under the internal name "Project ACDC," Apple is developing Apple Silicon designed specifically for server farms dedicated to AI processing. The company aims to optimize AI applications within its data centers for future versions of its platforms.
Do you know of a model on huggingface?
Do you know of a model on huggingface? @taylorgoolsby https://huggingface.co/apple/mistral-coreml
Description
Please consider adding Core ML model package format support to utilize Apple Silicone Nural Engine + GPU.
Success Criteria Utilize both ANE & GPU, not just GPU on Apple Silicon
Additional Context
List of Core ML package format models https://github.com/likedan/Awesome-CoreML-Models
Work in progress on CoreML implementation for [whisper.cpp]. They see x3 performance improvements for some models. (https://github.com/ggerganov/whisper.cpp/discussions/548) you might be interested in.
You might also be interested in another implementation Swift Transformers. Example of CoreML application https://github.com/huggingface/swift-chat