argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.92k stars 330 forks source link

Added MLX Audio Encoder #139

Closed jkrukowski closed 5 months ago

jkrukowski commented 6 months ago

This PR adds MLX Audio Encoder

The implementation is based on the AudioEncoder from the mlx-examples repository.

To make sure the audio encoder works as expected, I have added the weights loading functionality. The weights are taken from https://huggingface.co/jkrukowski/whisper-tiny-mlx-safetensors repository. This repository contains the weights for the whisper-tiny-mlx model transformed to the safetensors format (for now MLX Swift does not have the ability to load .npz files). I have added the MLX weights download functionality to Makefile to make sure the tests are run correctly. This could be removed in the future once the MLX branch is fully integrated into the main repository and we decide on the best way to handle the weights.

I have changed the project structure a bit. I have moved the common test utilities to the WhisperKitTestsUtils target. The resources used for testing (audio files and models) are moved there as well. This way we can reuse resources in both, MLX and non-MLX tests. Additionally, it simiplifies the project structure a bit -- WhisperKitTests target no longer has to have the custom path and bunch of excluded files.