AugustDev / enchanted

Enchanted is iOS and macOS app for chatting with private self hosted language models such as Llama2, Mistral or Vicuna using Ollama.
Apache License 2.0
2.63k stars 167 forks source link

Feature Request: Apple Silicone Neural Engine - Core ML model package format support #101

Open qdrddr opened 2 months ago

qdrddr commented 2 months ago

Description

Please consider adding Core ML model package format support to utilize Apple Silicone Nural Engine + GPU.

Success Criteria Utilize both ANE & GPU, not just GPU on Apple Silicon

Additional Context

List of Core ML package format models https://github.com/likedan/Awesome-CoreML-Models

Work in progress on CoreML implementation for [whisper.cpp]. They see x3 performance improvements for some models. (https://github.com/ggerganov/whisper.cpp/discussions/548) you might be interested in.

You might also be interested in another implementation Swift Transformers. Example of CoreML application https://github.com/huggingface/swift-chat

Core ML is a framework that can redistribute workload across CPU, GPU & Nural Engine (ANE). ANE is available on all modern Apple Devices: iPhones & Macs (A14 or newer and M1 or newer). Ideally, we want to run LLMs on ANE only as it has optimizations for running ML tasks compared to GPU. Apple claims "deploying your Transformer models on Apple devices with an A14 or newer and M1 or newer chip to achieve up to 10 times faster and 14 times lower peak memory consumption compared to baseline implementations".

  1. To utilize Core ML first, you need to convert a model from TensorFlow, PyTorch to Core ML model package format using coremltools (or simply utilize existing models in Core ML package format ).
  2. Second, you must now use that converted package with an implementation designed for Apple Devices. Here is the Apple XCode reference PyTorch implementation.

https://machinelearning.apple.com/research/neural-engine-transformers

vanhalt commented 2 months ago

Hey there!

Sorry if I am misunderstanding the feature request.

How exactly would Enchanted interact with an interface that those models that take advantage of ANE & GPU? If I understand correctly this software only interacts thorough APIs with Ollama and OpenAI.

Am I missing something? Would love to understand!

qdrddr commented 2 months ago

CoreML is Apple's tech. You convert a model into CoreML format and it execute faster on Apple silicon. Like 3 times faster. @vanhalt

bgiesing commented 1 month ago

CoreML is Apple's tech. You convert a model into CoreML format and it execute faster on Apple silicon. Like 3 times faster.

That would be something you request Ollama to implement, not Enchanted. Enchanted is simply a native Mac GUI to Ollama which does all the actual model handeling, generation, etc. Ollama needs to support using CoreML.