aixplain / tts-qa

62 stars 2 forks source link

TTS QA - Quality Assessment Text to Speech Data Annotation Tool

Overview

TTS QA - Workflow

For a quick overview of the project, please watch the following video: TTS QA - Quality Assessment Text to Speech Data Annotation Tool

Step 1: Prerequisites

Details:

Prior to running the code, you will need to set up the following services to set up the repo:

After both are set up, you should enter the relevant information and credentials in the environment files:

  1. Configure the environment file: vars.env.

  2. Open the vars.env file and add the following environment variables:

    • S3_BUCKET_NAME: The name of the S3 bucket where the video and subtitles will be stored. THE BUCKET MUST BE PUBLIC.
    • S3_DATASET_DIR: The folder path inside the S3 bucket where the video will be stored.
    • AWS_ACCESS_KEY_ID: Your AWS access key ID.
    • AWS_SECRET_ACCESS_KEY: Your AWS secret access key.
    • AWS_DEFAULT_REGION: The AWS region where the S3 bucket is located.
  3. Configure the environment file: secrets.env.

  4. Open the secrets.env.example file and save it as secrets.env. Then, add the following environment variables:

    • TEAM_API_KEY: aiXplain Platform API key. (Generated from aiXplain platform from Team Settings > Integrations)
    • HUGGINGFACE_TOKEN: Huggingface token for using the models. (You may get it from the huggingface website after signing up)

Start Database

To start the postgreSQL database and the redis database, run the following command.

bash docker-compose.sh start

This will create docker containers for each.

Start Celery

Use the following command to start celery. This is used to schedule tasks asynchronously.

celery -A src.service.tasks worker --loglevel=info --pool=threads

Start Backend

The following will start the backend service that handes the data processing.

uvicorn src.service.api:app --port 8089 --reload

Start WebApp Frontend

Please note that the previous services need to be running properly for the web app to work.

1. Annotator

You may use the following command to run the annotator app while in the project root directory.

python -m streamlit run --server.port 8501 ./src/web_app/annotator/🏠_Intro_annotator.py

2. Admin

You may use the following command to run the admin app while in the project root directory.

python -m streamlit run --server.port 8502  --server.maxUploadSize 8192 ./tts-qa/src/web_app/admin/🏠_Intro_admin.py

You may choose open ports of your choice.

You can upload a csv file containing the text and a zip file containing recordings. Example file may be downloaded from the frontend to see the format. Moreover, you may also extract the start and end ids of the recordings from the file names by providing a regex filter to extract those. After uploading the corresponding files, the processing will start. Once the initial processing end (visible through celery), you will need to start the trimming script using.

python ./tts-qa/src/utils/trim_asr.py

How to dump and restore the database

The following command generates a backup dump of the database with the timestamp as its name, which you may save to s3.

docker exec -t postgres_container_dev pg_dump -U postgres  dev_tts_db > dump_`date +%Y-%m-%d"_"%H_%M_%S`.sql

cat dump_2023-08-08_10_16_24.sql | docker exec -i postgres_container_dev  psql -U postgres dev_tts_db

The above is for postgres_container_dev, however, you can replace dev with prod for the production container.

Get Duration script

Here are some queries to run to get some insights about the data.

CREATE FUNCTION ROUND(float,int) RETURNS NUMERIC AS $f$
  SELECT ROUND( CAST($1 AS numeric), $2 )
$f$ language SQL IMMUTABLE;
SELECT dataset.name as dataset_name, ROUND(SUM(sample.trimmed_audio_duration) / 60, 2)   AS minutes, ROUND(SUM(sample.trimmed_audio_duration) / 3600, 2)   AS hours
FROM sample
JOIN dataset ON sample.dataset_id = dataset.id
WHERE dataset.name NOT LIKE '%English%' AND dataset.name NOT LIKE '%German%'
GROUP BY dataset.name
ORDER BY dataset.name;

Sum of the duration of all the samples in a dataset

SELECT SUM(sample.trimmed_audio_duration) / 60 / 60 as duration_after_trimming
FROM sample
JOIN dataset ON sample.dataset_id = dataset.id
WHERE dataset.name LIKE '%' || 'English' || '%';

Sum of the duration of all the samples for each dataset.language

SELECT dataset.language as dataset_name, ROUND(SUM(sample.trimmed_audio_duration) / 60, 2)   AS minutes, ROUND(SUM(sample.trimmed_audio_duration) / 3600, 2)   AS hours
FROM sample
JOIN dataset ON sample.dataset_id = dataset.id
WHERE dataset.name NOT LIKE '%English (A%'
GROUP BY dataset.language
ORDER BY dataset.language;