argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.89k stars 328 forks source link

Updating swift-transformer #247

Open BrandonWeng opened 1 week ago

BrandonWeng commented 1 week ago

Hey folks, was wondering what it would take to upgrade swift-transformer to the latest version?

Apologies, totally new to Swift.. Happy to make the PR if there's no known blockers

ZachNagengast commented 1 week ago

@BrandonWeng Is there a specific feature you're looking for in the latest version? We upgraded to the point right before jinja was added to avoid another dependency that we don't have much use for, but would consider upgrading there's a need.

BrandonWeng commented 1 week ago

I think we found a way around this.

We were trying to get MLX running, but the examples required > 0.1.12

https://github.com/ml-explore/mlx-swift-examples/tree/main

BrandonWeng commented 1 week ago

I'll just leave this here: https://github.com/argmaxinc/WhisperKit/pull/249

Happy to close the issue + PR if you don't think that its necessary. Just wanted to leave it here in case other folks run into the same issue. Spent several hours trying to work around it but this turned out to be the simplest solution for us

ZachNagengast commented 1 week ago

Thanks! Curious to hear more about the approach you're taking with MLX, we have a PR in progress that still needs a couple perf improvements #200

BrandonWeng commented 1 week ago

Unfortunately, I'm pretty new to Swift and its ecosystem as a whole. I'm just trying out a bunch of different things right now. Will report back once I have a better understanding!

For now, I've only been comparing the mlx models, the quantized models use significantly less memory: mlx-community/Llama-3.2-1B-Instruct-bf16 uses around 2.5GB of memory and mlx-community/Llama-3.2-1B-Instruct-8bit is around 1.5GB. Performance wise bf16 isn't too far off from 8bit