Sharrnah / whispering

Whispering Tiger - OpenAI's whisper (and other models) with OSC and Websocket support. Allowing live transcription / translation in VRChat and Overlays in most Streaming Applications
MIT License
402 stars 29 forks source link

Facebook's SeamlessM4T Integration (Multilingual + Multimodal) #17

Closed Infinitay closed 1 year ago

Infinitay commented 1 year ago

Facebook just released a new multimodal model for multiple languages. I would assume it's the successor to NLLB. One model to rule them all. At first glance, it seems that the size of the SM4T Large matches that of NLLB Large alone. Furthermore, CT2 would be great. For whispering users that takes advantage of all the available x-to-x features, this model would be good to support

image

Website: https://ai.meta.com/resources/models-and-libraries/seamless-communication/ Code: https://github.com/facebookresearch/seamless_communication Paper: https://ai.meta.com/research/publications/seamless-m4t/ Blog Post: https://ai.meta.com/blog/seamless-m4t/

Some Metrics ![image](https://github.com/Sharrnah/whispering/assets/6964154/99ba2dc2-af3b-4375-85cb-a39baa660753) ![image](https://github.com/Sharrnah/whispering/assets/6964154/18a8df3a-d848-42e8-b696-63bf42cfa9b4) ![image](https://github.com/Sharrnah/whispering/assets/6964154/26630236-ff4f-4c0f-b1b8-76a3582b2602)
Sharrnah commented 1 year ago

unfortunately its currently Linux only. So we have to see if and when they fix it for Windows.

Still interesting and maybe i find a solution myself. They say fairseq2n "can be ignored" though its a requirement for fairseq2 and only builds on Linux...

Sharrnah commented 1 year ago

Thank you. This is now implemented with the new release (https://github.com/Sharrnah/whispering/releases/tag/v1.3.11.1)

Only for S2TT and T2TT for now and only using the transformer lib and no quantization yet.