ML models latches onto devices
Latched provides easy-to-use pipelines to perform ML models on various devices such as mobile, Nvidia jetson, Intel CPUs, and accelerators.
Latched covers both converting models and deploying them(Latched Model Manager, Latched Devices SDKs).
🤖 Supported ML Tasks
📚 Text:
- Small Language Models, to embed chat-bot or text analysis on device
- Llama-3.1-8B-Instruct + OmniQuantW3A16 @ iPhone 15 Pro (coming soon)
- Other models will be supported soon
- Other tasks will be supported soon
🏞️ VIsion:
- Object Detection (coming soon)
- Image Classification (coming soon)
- Other tasks will be supported soon
🗣️ Audio:
- Speech to Text, Automatic Speech Recognition (coming soon)
- Other tasks will be supported soon
Supported Frameworks:
🧩 Latched Components
Latched: Latched python library provides hardware-aware optimization. With this library, you can export your ML model into hardware-optimized forms.
Latched Model Manager: Latched model manager provides a RESTful API to register and run ML models on various devices.
Latched Devices SDKs: Latched devices SDKs provide libraries to run ML models on various devices.
🚀 Getting Started
Installation
- Clone the repository
git clone https://github.com/TBD-Labs-AI/latched.git
cd latched
- Make the virtual environment with
Python 3.11.9
and activate it.
conda create -n latched python=3.11.9
conda activate latched
- Install the dependencies with Poetry
pip install poetry
poetry install
- Launch the test script (onnx export)
python examples/llama-3.1-8B-Instruct-to-onnx/llama_onnx_example.py
How to use Latched
- Export HuggingFace Models to the ONNX format
- Export HuggingFace Models to the OpenVINO format
📚 Model Hub
coming soon
Contributing
Do you believe the future of AI is on edge computing? Do you want to make it happen?
Join Latched as a contributor!
If you want to contribute to Latched, please read the CONTRIBUTING.md file.
📅 Milestones
SEP 2024
- [ ] Optimize Phi 3.5 mini model
- [ ] Export Phi 3.5 mini model to
- [ ] CoreML
- [ ] TensorFlow Lite
- [ ] TensorRT
- [ ] OpenVINO
- [ ] ONNX
- [ ] Optimize Phi 3.5 mini model to
- [ ] Apple iPhone 15 Pro
- [ ] Samsung Galaxy S24
- [ ] Nvidia Jetson
- [ ] Intel CPU
- [ ] Intel Gaudi2
- [ ] Rebellion ATOM
- [ ] AWS Inferentia
- [ ] Register Phi 3.5 mini model to Model Manager
- [ ] Create Swift example code to run
- [ ] Phi 3.5 mini model on Apple iPhone 15 Pro
- [ ] Phi 3.5 mini model on Samsung Galaxy S24
- [ ] Phi 3.5 mini model on Nvidia Jetson
- [ ] Phi 3.5 mini model on Intel CPU
- [ ] Phi 3.5 mini model on Intel Gaudi2
- [ ] Phi 3.5 mini model on Rebellion ATOM
- [ ] Phi 3.5 mini model on AWS Inferentia
- [ ] Release Benchmark Dashboard of Phi 3.5 mini model on each devices
🤝 Acknowledgements
This repository uses the following third-party libraries: