brilliantlabsAR / noa-for-ios

You AI companion. ChatGPT and translation for Monocle AR
ISC License
70 stars 14 forks source link
ai ar chatgpt ios whisper

Noa for iOS: AI Chat for Monocle with iOS Devices

Copyright 2023 Brilliant Labs Ltd.

Overview

Noa for iOS is an application that pairs to your Monocle and empowers you with access to ChatGPT anywhere you need it. Simply tap to speak a question and see the response appear in your field of view. The iOS application can also function as a standalone chat interface to ChatGPT, allowing queries to be entered via the iOS keyboard.

iOS screenshot

Getting Started

Getting started is easy:

OpenAI API Key

For Developers

We encourage developers to extend Noa for iOS or to use it as a template project for their own Monocle apps. This section provides a brief overview of the program structure.

Key iOS App Source Files

The iOS project is located in ios/Noa/. Open ios/Noa/Noa.xcodeproj using Xcode. All source files and assets are in ios/Noa/Noa/. The key files to start with are:

Monocle Scripts, Firmware, and FPGA Images

The Python code that runs on Monocle is stored in ios/Noa/Noa/Monocle Assets/Scripts/. All files are uploaded to Monocle, which is then instructed to run main.py. Communication with the iOS app uses the data characteristic.

Firmware (MicroPython) is stored in ios/Noa/Noa/Monocle Assets/Firmware/. Each time Monocle connects, the app checks the current firmware version to make sure it is the one expected by Noa and if needed, uploads the correct version.

FPGA images are located in ios/Noa/Noa/Monocle Assets/FPGA/. The app also checks to ensure the correct FPGA image is loaded.

iOS/Monocle Communication Flow

From the perspective of the iOS companion app, communication with Monocle is driven by a state machine:

The state machine makes use of the Swift language's data-carrying enum feature to pass some state information in the state enums themselves. Some state is also retained through members of Controller. In order to present a single progress bar during the firmware and FPGA update sequences, the app needs to know whether a DFU update has just completed in order to properly scale the FPGA progress percentage. This is done by passing a didFinishDFU boolean with the initial states. In other words, this is purely for cosmetic purposes.

The running state is where most of the work happens. Monocle communicates with the app using a simple protocol. All commands are handled in onMonocleCommand in Controller.swift. Although technically stateful, the protocol was designed so that the iOS app can simply react to each command. Each command is 4 characters followed by optional command-specific data:

Translation Support

Noa supports translation from any language supported by Whisper to English. Enable this in the settings menu. When speaking through Monocle, Whisper is used to perform this translation automatically without involving ChatGPT. When using the iOS app to type statements, ChatGPT is employed. Controller operates in two modes, assistant and translator. The mode is passed to the ChatGPT module, which uses a different system prompt for each to accomplish the desired task.

Python Script Versioning

A SHA-256 digest is computed from the Python scripts and their filenames by concatenating them all together. Then, just before transmitting them, the version string is inserted into the source code. Therefore, if Noa for iOS is already running on Monocle, ARGPT_VERSION will have been defined and can be checked against the iOS app's Python scripts.

Audio Format

As of the initial version, 8-bit 8KHz mono audio is sent from Monocle in order to minimize the transmission time. iOS AVAudioPCMBuffer does not support this format natively but the conversion to a 16-bit buffer is trivial.

Whisper expects 16-bit 16KHz audio. A drawback of the 8-bit sampling is loss of dynamic range and increased sensitivity to background noise.