DrewThomasson / ebook2audiobook

Generates an audiobook with chapters and ebook metadata using Calibre and Xtts from Coqui tts, and with optional voice cloning, and supports multiple languages
MIT License
795 stars 78 forks source link
audiobooks chinese docker english epub gradio linux mac multilingual tts voice-cloning windows xtts

๐Ÿ“š ebook2audiobook

Convert eBooks to audiobooks with chapters and metadata using Calibre and Coqui XTTS. Supports optional voice cloning and multiple languages!

๐Ÿ–ฅ๏ธ Web GUI Interface

demo_web_gui

Click to see images of Web GUI image image image

README.md

๐ŸŒŸ Features

๐Ÿค— Huggingface space demo

Free Google Colab Free Google Colab

๐Ÿ› ๏ธ Requirements

๐Ÿ”ง Installation Instructions

  1. Install Python 3.x from Python.org.

  2. Install Calibre:

    • Ubuntu: sudo apt-get install -y calibre
    • macOS: brew install calibre
    • Windows (Admin Powershell): choco install calibre
  3. Install FFmpeg:

    • Ubuntu: sudo apt-get install -y ffmpeg
    • macOS: brew install ffmpeg
    • Windows (Admin Powershell): choco install ffmpeg
  4. Optional: Install Mecab (for non-Latin languages):

    • Ubuntu: sudo apt-get install -y mecab libmecab-dev mecab-ipadic-utf8
    • macOS: brew install mecab, brew install mecab-ipadic
    • Windows: mecab-website-to-install-manually (Note: Japanese support is limited)
  5. Install Python packages:

    pip install coqui-tts==0.24.2 pydub nltk beautifulsoup4 ebooklib tqdm gradio==4.44.0
    
    python -m nltk.downloader punkt
    python -m nltk.downloader punkt_tab

    For non-Latin languages:

    pip install mecab mecab-python3 unidic
    
    python -m unidic download

๐ŸŒ Supported Languages

Specify the language code when running the script in headless mode.

๐Ÿš€ Usage

๐Ÿ–ฅ๏ธ Launching Gradio Web Interface

  1. Run the Script:

    python app.py
  2. Open the Web App: Click the URL provided in the terminal to access the web app and convert eBooks.

  3. For Public Link: Add --share True to the end of it like this: python app.py --share True

    • [For More Parameters]: use the -h parameter like this python app.py -h

๐Ÿ“ Basic Headless Usage

python app.py --headless True --ebook <path_to_ebook_file> --voice [path_to_voice_file] --language [language_code]

๐Ÿงฉ Headless Custom XTTS Model Usage

python app.py --headless True --use_custom_model True --ebook <ebook_file_path> --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path> --custom_config <custom_config_path> --custom_vocab <custom_vocab_path>

๐Ÿงฉ Headless Custom XTTS Model Usage With Zip link to XTTS Fine-Tune Model ๐ŸŒ

python app.py --headless True --use_custom_model True --ebook <ebook_file_path> --voice <target_voice_file_path> --language <language> --custom_model_url <custom_model_URL_ZIP_path>

๐Ÿ” For Detailed Guide with list of all Parameters to use

python app.py -h

Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the Gradio interface or run the script in headless mode for direct conversion.

options: -h, --help show this help message and exit --share SHARE Set to True to enable a public shareable Gradio link. Defaults to False. --headless HEADLESS Set to True to run in headless mode without the Gradio interface. Defaults to False. --ebook EBOOK Path to the ebook file for conversion. Required in headless mode. --voice VOICE Path to the target voice file for TTS. Optional, uses a default voice if not provided. --language LANGUAGE Language for the audiobook conversion. Options: en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko. Defaults to English (en). --use_custom_model USE_CUSTOM_MODEL Set to True to use a custom TTS model. Defaults to False. Must be True to use custom models, otherwise you'll get an error. --custom_model CUSTOM_MODEL Path to the custom model file (.pth). Required if using a custom model. --custom_config CUSTOM_CONFIG Path to the custom config file (config.json). Required if using a custom model. --custom_vocab CUSTOM_VOCAB Path to the custom vocab file (vocab.json). Required if using a custom model. --custom_model_url CUSTOM_MODEL_URL URL to download the custom model as a zip file. Optional, but will be used if provided. Examples include David Attenborough's model: 'https://huggingface.co/drewThomasson/xtts_David_Attenbor ough_fine_tune/resolve/main/Finished_model_files.zip?download=tr ue'. More XTTS fine-tunes can be found on my Hugging Face at 'https://huggingface.co/drewThomasson'. --temperature TEMPERATURE Temperature for the model. Defaults to 0.65. Higher Tempatures will lead to more creative outputs IE: more Hallucinations. Lower Tempatures will be more monotone outputs IE: less Hallucinations. --length_penalty LENGTH_PENALTY A length penalty applied to the autoregressive decoder. Defaults to 1.0. Not applied to custom models. --repetition_penalty REPETITION_PENALTY A penalty that prevents the autoregressive decoder from repeating itself. Defaults to 2.0. --top_k TOP_K Top-k sampling. Lower values mean more likely outputs and increased audio generation speed. Defaults to 50. --top_p TOP_P Top-p sampling. Lower values mean more likely outputs and increased audio generation speed. Defaults to 0.8. --speed SPEED Speed factor for the speech generation. IE: How fast the Narrerator will speak. Defaults to 1.0. --enable_text_splitting ENABLE_TEXT_SPLITTING Enable splitting text into sentences. Defaults to True.

Example: python script.py --headless --ebook path_to_ebook --voice path_to_voice --language en --use_custom_model True --custom_model model.pth --custom_config config.json --custom_vocab vocab.json


<details>
  <summary>โš ๏ธ Legacy-Depricated Old Use Instructions</summary>

## ๐Ÿš€ Usage

## Legacy files have been moved to `ebook2audiobookXTTS/legacy/`

### ๐Ÿ–ฅ๏ธ Gradio Web Interface

1. **Run the Script**:
   ```bash
   python custom_model_ebook2audiobookXTTS_gradio.py
  1. Open the Web App: Click the URL provided in the terminal to access the web app and convert eBooks.

๐Ÿ“ Basic Usage

python ebook2audiobook.py <path_to_ebook_file> [path_to_voice_file] [language_code]

๐Ÿงฉ Custom XTTS Model

python custom_model_ebook2audiobookXTTS.py <ebook_file_path> <target_voice_file_path> <language> <custom_model_path> <custom_config_path> <custom_vocab_path>

๐Ÿณ Using Docker

You can also use Docker to run the eBook to Audiobook converter. This method ensures consistency across different environments and simplifies setup.

๐Ÿš€ Running the Docker Container

To run the Docker container and start the Gradio interface, use the following command:

-Run with CPU only

docker run -it --rm -p 7860:7860 --platform=linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py

-Run with GPU Speedup (Nvida graphics cards only)

docker run -it --rm --gpus all -p 7860:7860 --platform=linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py

This command will start the Gradio interface on port 7860.(localhost:7860)

Example of using docker in headless mode

first for a docker pull of the latest with

docker pull athomasson2/ebook2audiobookxtts:huggingface
docker run -it --rm \
    -v $(pwd)/input-folder:/home/user/app/input_folder \
    -v $(pwd)/Audiobooks:/home/user/app/Audiobooks \
    --platform linux/amd64 \
    athomasson2/ebook2audiobookxtts:huggingface \
    python app.py --headless True --ebook /home/user/app/input_folder/YOUR_INPUT_FILE.TXT

To get the help command for the other parameters this program has you can run this

docker run -it --rm \
    --platform linux/amd64 \
    athomasson2/ebook2audiobookxtts:huggingface \
    python app.py -h

and that will output this

user/app/ebook2audiobookXTTS/input-folder -v $(pwd)/Audiobooks:/home/user/app/ebook2audiobookXTTS/Audiobooks --memory="4g" --network none --platform linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py -h
starting...
usage: app.py [-h] [--share SHARE] [--headless HEADLESS] [--ebook EBOOK] [--voice VOICE]
              [--language LANGUAGE] [--use_custom_model USE_CUSTOM_MODEL]
              [--custom_model CUSTOM_MODEL] [--custom_config CUSTOM_CONFIG]
              [--custom_vocab CUSTOM_VOCAB] [--custom_model_url CUSTOM_MODEL_URL]
              [--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY]
              [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P]
              [--speed SPEED] [--enable_text_splitting ENABLE_TEXT_SPLITTING]

Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the
Gradio interface or run the script in headless mode for direct conversion.

options:
  -h, --help            show this help message and exit
  --share SHARE         Set to True to enable a public shareable Gradio link. Defaults
                        to False.
  --headless HEADLESS   Set to True to run in headless mode without the Gradio
                        interface. Defaults to False.
  --ebook EBOOK         Path to the ebook file for conversion. Required in headless
                        mode.
  --voice VOICE         Path to the target voice file for TTS. Optional, uses a default
                        voice if not provided.
  --language LANGUAGE   Language for the audiobook conversion. Options: en, es, fr, de,
                        it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko. Defaults to
                        English (en).
  --use_custom_model USE_CUSTOM_MODEL
                        Set to True to use a custom TTS model. Defaults to False. Must
                        be True to use custom models, otherwise you'll get an error.
  --custom_model CUSTOM_MODEL
                        Path to the custom model file (.pth). Required if using a custom
                        model.
  --custom_config CUSTOM_CONFIG
                        Path to the custom config file (config.json). Required if using
                        a custom model.
  --custom_vocab CUSTOM_VOCAB
                        Path to the custom vocab file (vocab.json). Required if using a
                        custom model.
  --custom_model_url CUSTOM_MODEL_URL
                        URL to download the custom model as a zip file. Optional, but
                        will be used if provided. Examples include David Attenborough's
                        model: 'https://huggingface.co/drewThomasson/xtts_David_Attenbor
                        ough_fine_tune/resolve/main/Finished_model_files.zip?download=tr
                        ue'. More XTTS fine-tunes can be found on my Hugging Face at
                        'https://huggingface.co/drewThomasson'.
  --temperature TEMPERATURE
                        Temperature for the model. Defaults to 0.65. Higher Tempatures
                        will lead to more creative outputs IE: more Hallucinations.
                        Lower Tempatures will be more monotone outputs IE: less
                        Hallucinations.
  --length_penalty LENGTH_PENALTY
                        A length penalty applied to the autoregressive decoder. Defaults
                        to 1.0. Not applied to custom models.
  --repetition_penalty REPETITION_PENALTY
                        A penalty that prevents the autoregressive decoder from
                        repeating itself. Defaults to 2.0.
  --top_k TOP_K         Top-k sampling. Lower values mean more likely outputs and
                        increased audio generation speed. Defaults to 50.
  --top_p TOP_P         Top-p sampling. Lower values mean more likely outputs and
                        increased audio generation speed. Defaults to 0.8.
  --speed SPEED         Speed factor for the speech generation. IE: How fast the
                        Narrerator will speak. Defaults to 1.0.
  --enable_text_splitting ENABLE_TEXT_SPLITTING
                        Enable splitting text into sentences. Defaults to True.

Example: python script.py --headless --ebook path_to_ebook --voice path_to_voice
--language en --use_custom_model True --custom_model model.pth --custom_config
config.json --custom_vocab vocab.json

๐Ÿ–ฅ๏ธ Docker GUI

demo_web_gui

Click to see images of Web GUI image image image

๐Ÿ› ๏ธ For Custom Xtts Models

Models built to be better at a specific voice. Check out my Hugging Face page here.

To use a custom model, paste the link of the Finished_model_files.zip file like this:

David Attenborough fine tuned Finished_model_files.zip

For a custom model a ref audio clip of the voice will also be needed: ref audio clip of David Attenborough

More details can be found at the Dockerfile Hub Page.

๐ŸŒ Fine Tuned Xtts models

To find already fine-tuned XTTS models, visit this Hugging Face link ๐ŸŒ. Search for models that include "xtts fine tune" in their names.

๐ŸŽฅ Demos

Rainy day voice

https://github.com/user-attachments/assets/8486603c-38b1-43ce-9639-73757dfb1031

David Attenborough voice

https://github.com/user-attachments/assets/47c846a7-9e51-4eb9-844a-7460402a20a8

๐Ÿค— Huggingface space demo

Free Google Colab Free Google Colab

๐Ÿ“š Supported eBook Formats

๐Ÿ“‚ Output

๐Ÿ› ๏ธ Common Issues:

What I need help with! ๐Ÿ™Œ

Full list of things can be found here

๐Ÿ™ Special Thanks