aflorithmic / apiaudio-python

api.audio Python SDK
https://www.api.audio
MIT License
25 stars 4 forks source link
audio audio-production mastering python python-sdk sound speech speech-audio text-to-speech tts voice

This repo is no longer in activate development, please use the audiostack SDK to continue using api.audio.

api.audio logo

apiaudio - python SDK


apiaudio is the official api.audio Python 3 SDK. This SDK provides easy access to the api.audio API for applications written in python.

📝 Table of Contents

🧐 About

This repository is actively maintained by Aflorithmic Labs. For examples, recipes and api reference see the api.audio docs. Feel free to get in touch with any questions or feedback!

:book: Changelog

You can view here our updated Changelog.

:speedboat: Quickstarts

Get started with our quickstart recipes.

🏁 Getting Started

Installation

You don't need this source code unless you want to modify it. If you want to use the package, just run:

pip install apiaudio -U
#or
pip3 install apiaudio -U

Install from source with:

python setup.py install
#or
python3 setup.py install

Prerequisites

Python 3.6+

🚀 Hello World

Create a file hello.py

touch hello.py

Authentication

This library needs to be configured with your account's api-key which is available in your api.audio Console. Import the apiaudio package and set apiaudio.api_key with the api-key you got from the console:

import apiaudio
apiaudio.api_key = "your-key"

Create Text to audio in 4 steps

Let's create our first audio asset.

✍️ Create a new script, our scriptText will be the text that is later synthesized.

script = apiaudio.Script.create(scriptText="Hello world")
print(script)

🎤 Render the scriptText that was created in the previous step. Lets use voice Aria.

response = apiaudio.Speech.create(scriptId=script["scriptId"], voice="Aria")
print(response)

🎧 Now let's join the speech we just created with a sound template.

response = apiaudio.Mastering.create(
    scriptId=script.get("scriptId"),
    soundTemplate="jakarta"
    )
print(response)

Download the final audio asset to your current working directory:

filepath = apiaudio.Mastering.download(scriptId=script["scriptId"], destination=".")
print(filepath)

Easy right? 🔮 This is the final hello.py file.

import apiaudio
apiaudio.api_key = "your-key"

# script creation
script = apiaudio.Script.create(scriptText="Hello world")

# speech creation
response = apiaudio.Speech.create(scriptId=script["scriptId"], voice="Aria")

print(response)

# mastering process
response = apiaudio.Mastering.create(
    scriptId=script.get("scriptId"),
    soundTemplate="jakarta"
    )
print(response)

# download
filepath = apiaudio.Mastering.download(scriptId=script["scriptId"], destination=".")
print(filepath)

Now let's run the code:

python hello.py
#or
python3 hello.py

Once this has completed, find the downloaded audio asset and play it! :sound: :sound: :sound:

📑 Documentation

Import

import apiaudio

Authentication

The library needs to be configured with your account's secret key which is available in your Aflorithmic Dashboard. Set apiaudio.api_key with the api-key you got from the dashboard:

apiaudio.api_key = "your-key"

Authentication with environment variable (recommended)

You can also authenticate using apiaudio_key environment variable and the apiaudio SDK will automatically use it. To setup, open the terminal and type:

export apiaudio_key=<your-key>

If you provide both an environment variable and apiaudio.api_key authentication value, the apiaudio.api_key value will be used instead.

Super Organizations

In order to control a child organization of yours, please use the following method to assume that organization id.

Set your child organization id to None to stop assuming an organization. Subsequent calls to the api will use your own organization id.

import apiaudio

apiaudio.set_assume_org_id('child_org_id')

# Stop using
apiaudio.set_assume_org_id(None)

See organization resource for more operations you can perform about your organization.

Resource Usage

There are two approaches to use the resources.

The recommended approach is to import all resources directly from apiaudio:

import apiaudio
apiaudio.Script.create()

Alternatively, you can import the resource classes you want to use first, and then use the resource methods. For example, to use Script, we could do:

from apiaudio import Script
Script.create()

Same logic applies for other resources (Speech, Voice, Sound...)

Organization resource

The Organization resource/class allows you to perform some data retrieval about your organization and your child organizations.

Organization methods are:

Script resource

The Script resource/class allows you to create, retrieve and list scripts. Learn more about scripts here.

Script methods are:

Speech resource

Speech allows you to do Text-To-Speech (TTS) with our API using all the voices available. Use it to create a speech audio file from your script.

Speech methods are:

Voice resource

Voice allows you to retrieve a list of the available voices from our API.

Voice methods are:

Sound resource

Sound allows you to design your own sound template from a script and a background track. In order to get a sound template/project, make sure you requested speech for your script resource first.

Sound methods are:

Mastering resource

Mastering allows you to create and retrieve a mastered audio file of your script. A mastered version contains the speech of the script, a background track, personalised parameters for your audience and a mastering process to enhance the audio quality of the whole track. In order to get a mastered audio file, make sure you requested speech for your script resource first.

Mastering methods are:

Media resource

Media allows you to retrieve all the files available in api.audio for your organization.

Media methods are:

SyncTTS resource

SyncTTS allows you to do Synchronous Text-To-Speech (TTS) with our API using all the voices available. Use it to create a speech audio file from a text and a voice name. The response contains wave bytes ready to be played or written to a file.

SyncTTS methods are:

Birdcache resource

Birdcache is a caching service provided by API.audio that provides the caching layer for the customer by storing data in API.audio servers for future use. This allows you to retrieve your speech files on the fly.

Birdcache methods are:

Pronunciation Dictionary resource

Often when working with TTS, the models can fail to accurately pronounce specific words, for example brands, names and locations are commonly mis-pronounced. As a first attempt to fix this we have introduced our lexi flag, which works in a similar way to SSML. For example, adding <!peadar> instead of Peadar (who is one of our founders) to your script will cause the model to produce an alternative pronunciation of this name. This is particularly useful in cases where words can have multiple pronunciations, for example the cities ‘reading’ and ‘nice’. In this instance placing <!reading> and <!nice> will ensure that these are pronounced correctly, given the script:

" The city of <!nice> is a really nice place in the south of france."

If this solution does not work for you, you can instead make use of our custom (self-serve) lexi feature.

This can be used to achieve one of two things, correcting single words, or expanding acronyms. For example, you can replace all occurrences of the word Aflorithmic with “af low rhythmic” or occurrences of the word ‘BMW’ with “Bayerische Motoren Werke”. Replacement words can be supplied as plain text or an IPA phonemisation.

Prononciation dictionary methods are:

Preview

The effect of applying the Pronunciation Dictionary can be seen with the script.preview() method. See Script documentation for more details.

Connector resource

Resource used for monitoring 3rd paty integrations. End results of Mastering resource can be distributed into external applications through connectors field. See connectors documentation. List of currently supported applications:

Available methods:

Orchestrator resource

The orchestrator is used to make working with a range of audio services as easy as sending a single API request. Each route here is carefully configured to produce high-quality and easy to access audio assets.

Orchestrator methods are:

Webhooks

This SDK provides an easy way of verifying apiaudio webhook call security headers. It is highly recommended for you to verify the headers in order to protect your server from any malicious attack.

The method is:

apiaudio.Webhooks.verify(payload, sig_header, secret, tolerance)

It will return true if the header is valid, otherwise it will raise an error. The parameters to pass are; payload being the body object sent by apiaudio, sig_header being X-Aflr-Secret in the request headers sent by apiaudio, secret being your webhook secret (you can get it in apiaudio console) and tolerance being the tolerance in seconds for the header checks, which defaults to 300 seconds.

Logging

By default, warnings issued by the API are logged in the console output. Additionally, some behaviors are logged on the informational level (e.g. "In progress..." indicators during longer processing times). The level of logging can be controlled by choosing from the standard levels in Python's logging library.

Maintainers

Development

There is a pre-commit hook that will run before you commit a file. This is to keep the code standards high. To enable it, you should run make. Then it will set up the pre-commit hook for git. Thats all! Now every time before you commit, it will run to tell you about the standards.

If you use VSCode for committing files, you may bump into pre-commit command not found error. That is ok, just run brew install pre-commit or your fave package manager from the list here.

If you bump into your pip version is old error, just ignore it and use the terminal.

If there is a problem and you are in a rush, you can add --no-verify at the end of the commit command, it will skip the pre-commit hooks, e.g git commit -m 'your commit message' --no-verify

License

This project is licensed under the terms of the MIT license.