[Feature Request]: Meta SeamlessM4T support

ggerganov / ggml

Tensor library for machine learning

MIT License

10.27k stars 945 forks source link

[Feature Request]: Meta SeamlessM4T support #471

Open zhongwei opened 10 months ago

maxng07 commented 10 months ago

With the latest update of the meta SeamlessM4T, the translator.py can detect if you have GPU or CPU. If no GPU, it will fallback to use CPU. I could run the translation for text to text using CPU, I have not tried audio to text or text to audio. The translator.py now has the checks from what I review the scripts yesterday.

m4t_predict "what's up" t2tt cmn --src_lang eng --model_name 'seamlessM4T_medium' 2023-09-03 05:21:14,576 INFO -- m4t_scripts.predict.predict: Running inference on the CPU in torch.float32. Using the cached checkpoint of the model 'seamlessM4T_medium'. Set force=True to download again. Using the cached tokenizer of the model 'seamlessM4T_medium'. Set force=True to download again. Using the cached checkpoint of the model 'vocoder_36langs'. Set force=True to download again. 2023-09-03 05:21:36,193 INFO -- m4t_scripts.predict.predict: Translated text in cmn: 有什么问题?

maxng07 commented 10 months ago

Reviewing this again, the change in the script is not on translator.py but in predict.py in https://github.com/facebookresearch/seamless_communication/blob/main/scripts/m4t/predict/predict.py Line 62 to 70 if torch.cuda.is_available(): device = torch.device("cuda:0") dtype = torch.float16 logger.info(f"Running inference on the GPU in {dtype}.") else: device = torch.device("cpu") dtype = torch.float32 logger.info(f"Running inference on the CPU in {dtype}.")

it detects if GPU is present if not fallback to CPU. I was able to run it on CPU for text to text translation, its single threaded on my 8 core machine. Again, I haven't try out audio to text and vice versa. But it does looks like seamlessm4t is supported on CPU without any work needed.

maxng07 commented 10 months ago

I did a quick test on text to audio/speech, it works on CPU too

m4t_predict "brother, where are you?" t2st ind --src_lang eng --model_name 'seamlessM4T_medium' --output_path /test.mp3 2023-09-04 04:39:20,577 INFO -- m4t_scripts.predict.predict: Running inference on the CPU in torch.float32. Using the cached checkpoint of the model 'seamlessM4T_medium'. Set force=True to download again. Using the cached tokenizer of the model 'seamlessM4T_medium'. Set force=True to download again. Using the cached checkpoint of the model 'vocoder_36langs'. Set force=True to download again. 2023-09-04 04:40:10,159 INFO -- m4t_scripts.predict.predict: Saving translated audio in ind 2023-09-04 04:40:10,174 INFO -- m4t_scripts.predict.predict: Translated text in ind: Saudara, di mana Anda?

bakkot commented 7 months ago

Meta appears to have done this themselves with the new M4Tv2 release: https://github.com/facebookresearch/seamless_communication/tree/main/ggml

Green-Sky commented 7 months ago

@ggerganov did you know about this?

ggerganov commented 7 months ago

@ggerganov did you know about this?

Huh, no - very cool!

We should help with the implementation

ggerganov commented 7 months ago

Nice, it even works!

It would be great to bring it up-to-date with latest ggml. Would reduce memory usage and enable GPU support, among other improvements. In any case, having a working implementation is of great help! Very cool to see this from the Meta team ❤️