Camb-ai / MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI
GNU Affero General Public License v3.0
1.37k stars 95 forks source link

Support for inferencing on Apple Silicon M1/M2/M3 chips using mps #34

Open origin-s20 opened 1 week ago

origin-s20 commented 1 week ago


The current notebook runs successfully on a MacBook Pro (tried on M3 Pro) but runs only on the cpu. Even if I set the torch device to mps it seems to fallback to the cpu. Is there a version / way to run this using mps for faster inferencing?

arnavmehta7 commented 1 week ago

Hey, I am on torch 2.4.0 and it works like a charm for me

# load model
device = "mps"
mars5, config_class = torch.hub.load('Camb-ai/mars5-tts', 'mars5_english', device=device, trust_repo=True)
print(f"Mars5 device: {mars5.device}")
Screenshot 2024-06-18 at 1 33 49 PM
arnavmehta7 commented 1 week ago

You can install 2.4.0 / nightly by this conda install pytorch-nightly::pytorch torchvision torchaudio -c pytorch-nightly

nivibilla commented 1 week ago

37 pr to auto detect this

nivibilla commented 1 week ago

even then it takes 8-10 mins for 5 words though

origin-s20 commented 1 week ago

Thank you @arnavmehta7, yes it's using the GPU now but deep clone took ~4m and shallow clone took ~5m (running on an M3 Pro - 18-core GPU, 18GB Unified Memory). In both cases the output had a lot of noise and wasn't anywhere close to the input text or voice.

arnavmehta7 commented 1 week ago

Hey @nivibilla @origin-s20, there are a few people in open source community who are trying to optimise the inference speed, things should become blazingly fast very soon!

Further someone can try to port this over to MLX which might improve the speed on M chips. Currently torch doesn't support a lot of layers for mps example:

The operator 'aten::col2im' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications.

(tested on same specs)

arnavmehta7 commented 1 week ago
