STT0049: stats of data acquired and trashed from audio split for each department.

Description

We need to find the stats of how much data we are losing from split audio function from each department. This will help us understand how many hours of data we are actually transcribing from the original audio duration that we have. if the loss is too big then we might need to update the split audio function.

here is the google sheet of daily upload to stt pecha tools: stt data upload stat updating spilt audio stats in sheet 3

Completion Criteria

A stats showing data loss during split audio function.

Implementation

Subtask

[x] prepare script to extract all audio details.
[x] extract all audio details from s3 bucket in csv file.
[x] prepare script to merge all catalog with extracted audio details.
[x] merge with catalog of each department.
[x] check for audio file name consistency in each department.
[x] compare stats of audio loss in each department.

OpenPecha / stt-split-audio

STT0049: stats of data acquired and trashed from audio split for each department. #19

Description

Completion Criteria

Implementation

Subtask

Output