OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.41k stars 303 forks source link

Support float16 on ARM CPUs with native float16 support #1153

Open FlippFuzz opened 1 year ago

FlippFuzz commented 1 year ago

From https://github.com/guillaumekln/faster-whisper/issues/65


Some CPUs such as ARM Neoverse-N1 (Oracle Cloud free tier) support FP16 computation. It would be nice to have this feature because there could be up to a 2x speedup in computation speed compared to float32.

I'm just creating the enhancement request and understand that there might not be a focus on this because it only applies to a small subset of CPUs.

ephemer commented 1 year ago

FWIW Apple Silicon CPUs also support FP16 so that adds some more potential consumers here

nlgtuankiet commented 1 year ago

I would love to use faster-whisper instead of whisper.cpp but the lack of FP16 on CPU is kind of the deal breaker for me. For now, with faster-whisper I have to choose between speed (int8) and accuracy (float32), FP16 is the missing balance point. Consider that ARM CPUs are very popular nowadays (mobile phone, Apple M series, cloud providers) I think this is a good idea to consider on fp16 support. Do you have any thoughts about this @guillaumekln

bil-ash commented 11 months ago

Not just arm and apple silicon, latest intel and AMD CPUs (I am using an AMD one) also support float16. Would be very nice if int8_int16 inference is supported on the CPUs which support float16.