doc2audiobook.py

Extract text from a document (textract) and convert it into a natural sounding synthesised speech (Cloud Text-To-Speech), which is able to leverage Deepminds Wavenet models.

Example

Input Output

Available source formats (from textract)

.csv
.doc
.docx
.eml
.epub
.gif
.jpg and .jpeg
.json
.html and .htm
.mp3
.msg
.odt
.ogg
.pdf
.png
.pptx
.ps
.rtf
.tiff
.txt
.wav
.xlsx
.xls

Prerequisites

GCP

Select or create a Google Cloud Platform project.
Enable billing for your project.
Enable the Cloud Text-to-Speech API.
Setup Authentication using a Service Account.

Host Machine

Docker
/doc2audiobook/data/input: directory to hold all input files.
/doc2audiobook/data/output: directory to store all output files.
/doc2audiobook/.secrets/client_secret.json: GCP authentication token.

Build

git clone git@github.com:danthelion/doc2audiobook.git
cd doc2audiobook
docker build -t doc2audiobook .

Run

Make sure to put your documents in the folder that is mapped to /data before running!

List available voices

docker run \
-v /doc2audiobook/data:/data:rw \
-v /doc2audiobook/.secrets/client_secret.json:/.secrets/client_secret.json:ro \
doc2audiobook -list-voices

Convert all documents in the mapped input folder to audiobooks using the en-GB-Standard-C voice.

docker run \
-v /doc2audiobook/data:/data:rw \
-v /doc2audiobook/.secrets/client_secret.json:/.secrets/client_secret.json:ro \
doc2audiobook --voice en-GB-Standard-C

Convert a single document in the mapped input folder to an audiobook using the en-GB-Standard-C voice.

docker run \
-v /doc2audiobook/data:/data:rw \
-v /doc2audiobook/.secrets/client_secret.json:/.secrets/client_secret.json:ro \
doc2audiobook --voice en-GB-Standard-C --input test_input.txt

danthelion / doc2audiobook

readme

doc2audiobook.py

Example

Prerequisites

Build

Run