STT0050: Creating conversation datas

Description :

We need to create conversation using Speaker diarisation and existing STT datas time stamps. from NS audios. Use existing speaker diarisation model from pyannote.audio: model i expect an output that is a json file:

{
  "conversations": [
    {
      "conversation_id": 1,
      "participants": ["Speaker One", "Speaker Two"],
      "dialogue": [
        {
          "speaker": "Speaker One",
          "text": "Hello, how are you?"
        },
        {
          "speaker": "Speaker Two",
          "text": "I'm good, thank you! How about you?"
        }
      ]
    },
    {
      "conversation_id": 2,
      "participants": ["Speaker One", "Speaker Two", "Speaker Three"],
      "dialogue": [
        {
          "speaker": "Speaker One",
          "text": "Are we meeting tomorrow?"
        },
        {
          "speaker": "Speaker Three",
          "text": "Yes, let's meet at 10 AM."
        },
        {
          "speaker": "Speaker Two",
          "text": "Sounds good to me."
        }
      ]
    }
  ]
}

Implementation:

Extract speaker information from the audio using a speaker diarization model, which will provide time intervals and speaker identities.
Align the speaker segments with STT transcriptions to assign the correct speaker to each text based on timestamp matching.
Organize the speaker dialogues into a structured format, identifying participants in each conversation and compiling their dialogues sequentially.
Output the conversation data into a JSON format where each conversation includes a unique conversation_id, the participants, and the dialogue.

Subtasks:

[ ] Make the script for a single audio first.
[ ] Apply the speaker diarisation to whole audios in NS.
1. Audio Diarization:
  - [ ] Use a pre-trained speaker diarization model (such as pyannote or other alternatives).
  - [ ] Process the full audio to identify speakers and their speaking intervals.
  - [ ] Output: A data structure that provides start and end times for each speaker.

Transcription Alignment:
- [ ] Parse the existing STT transcriptions with timestamps.
- [ ] Match transcription timestamps with speaker intervals from the diarization step.
- [ ] Assign each transcription to the appropriate speaker.
Data Structuring:
- [ ] Organize the speaker and transcription data into conversation blocks.
- [ ] Identify participants in each conversation.
- [ ] Ensure proper sequencing of dialogues for a coherent conversation flow.
JSON Output Generation:
- [ ] Create a function to compile the conversations into a JSON structure as shown in the example.
- [ ] Export the structured conversations into a JSON file.

OpenPecha / stt_create_conversation_data

STT0050: Creating conversation datas #1

Description :

Implementation:

Subtasks: