aedocw / epub2tts

Turn an epub or text file into an audiobook
Apache License 2.0
426 stars 43 forks source link

Add support for TTS which fully support Apple Silicon #233

Open AriaShishegaran opened 2 months ago

AriaShishegaran commented 2 months ago

I don't know if this request fully makes sense but I did some research around like of Piper and Tortoise which have shown the capability to leverage Metal and nightly builds of PyTorch to run more smoothly, let's say compared to XTTS (which requires me to pass the --no-deepspeed to even run).

So just wondering whether it is possible to integrate such systems into epub2tts for people wanting a better quality than p307 but also faster generation speeds on a Macbook.

aedocw commented 2 months ago

There's an existing PR that incorporates MPS (https://github.com/aedocw/epub2tts/pull/181). The last time I tested it there was no notable improvement in speed. If it requires a nightly build of PyTorch that complicates things, though I did intend to incorporate MPS support once there was no additional work for the user required.

If you want to try out that PR and report back on how it works, that would be much appreciated!

As for better quality than VITS (like voice p307) I encourage you to try --engine edge, which I have found to be the most reliable and high quality voices yet. As impressive as XTTS is, the repeats and gibberish really get annoying when you're doing a long book and they end up cropping up frequently. Personally I'm using MS Edge for encodings now, and have been going back and re-encoding older books that I might listen to again because it's significantly better IMO.

AriaShishegaran commented 2 months ago

@aedocw Thanks. Edge provides great quality. I'm just wondering how the xtts can be further leveraged. I checked out to the mps branch and ran the code again. I'm wondering should I still pass the --no-deepspeed for the code to start running? I'm getting this error no matter which branch. And if so, should I still monitor for any possible changes in performance when the --no-deepspeed is passed down?

Total characters: 661976
Not enough VRAM on GPU or CUDA not found. Using CPU
Loading model: /Users/lucifer/Library/Application Support/tts/tts_models--multilingual--multi-dataset--xtts_v2
 > tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
 > Using model: xtts
[2024-04-07 22:35:15,124] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to mps (auto detect)
W0407 22:35:15.203000 8635480768 torch/distributed/elastic/multiprocessing/redirects.py:27] NOTE: Redirects are currently not supported in Windows or MacOs.
Traceback (most recent call last):
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/bin/epub2tts", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/epub2tts.py", line 837, in main
    mybook.read_book(
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/epub2tts.py", line 452, in read_book
    self.model.load_checkpoint(
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/TTS/tts/models/xtts.py", line 783, in load_checkpoint
    self.gpt.init_gpt_for_inference(kv_cache=self.args.kv_cache, use_deepspeed=use_deepspeed)
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/TTS/tts/layers/xtts/gpt.py", line 222, in init_gpt_for_inference
    import deepspeed
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/deepspeed/__init__.py", line 26, in <module>
    from . import module_inject
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/deepspeed/module_inject/__init__.py", line 6, in <module>
    from .replace_module import replace_transformer_layer, revert_transformer_layer, ReplaceWithTensorSlicing, GroupQuantizer, generic_injection
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 607, in <module>
    from ..pipe import PipelineModule
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/deepspeed/pipe/__init__.py", line 6, in <module>
    from ..runtime.pipe import PipelineModule, LayerSpec, TiedLayerSpec
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/deepspeed/runtime/pipe/__init__.py", line 6, in <module>
    from .module import PipelineModule, LayerSpec, TiedLayerSpec
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/deepspeed/runtime/pipe/module.py", line 19, in <module>
    from ..activation_checkpointing import checkpointing
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 26, in <module>
    from deepspeed.runtime.config import DeepSpeedConfig
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/deepspeed/runtime/config.py", line 42, in <module>
    from ..elasticity import (
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/deepspeed/elasticity/__init__.py", line 10, in <module>
    from .elastic_agent import DSElasticAgent
  File "/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/deepspeed/elasticity/elastic_agent.py", line 9, in <module>
    from torch.distributed.elastic.agent.server.api import log, _get_socket_with_port
ImportError: cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/Users/lucifer/Downloads/Codes/epub2tts/.venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py)
aedocw commented 2 months ago

Ah that's from a contribution someone put in that would skip GPU even if it was found if there was not enough memory.

I would search for the line with torch.cuda.get_device_properties(0).total_memory where it checks that it's at least 3500000000 and change it to something silly like 500 just to make sure it basically ignores that test.

You probably do have to use --no-deepspeed though, I've tried it on an M1 macbook pro and I hit a different error with deepspeed. Disabling deepspeed allows me to use XTTS v2 but it seems pretty slow. Doing "sample.txt" which is only 727 characters took a few minutes. Using MPS seemed to be roughly the same iterations per second as using CPU.

AriaShishegaran commented 2 months ago

@aedocw Well for me on a maxed out M3 Macbook, it did make some difference where the number was 2-3it/s but now is around 6-7. With that said, it still took me around 14 hours to have one book, Structures or Why Things Don't Fall Down, converted to the audiobook using XTTS which is abysmal in terms of overall performance. I'll report back my time using Edge but still very much interested in alternative solutions if any on how can it be done more efficiently. I rented a small instance of 1xH100 gpu for an hour and even that wasn't fast enough. My main question still persists, what should a person at this stage of this technology expect in terms of optimal performance and inference time for a normal 400-500 page book.

aedocw commented 2 months ago

Using Edge I think a book like that would probably take 3 hours or so, and that is not dependent on your hardware.

Using XTTS and a computer with an nVidia GPU, I think that is again about 3 hours.

Without full deepspeed compatibility XTTS is unusable in my opinion.