deepgram / deepgram-python-sdk

Official Python SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
204 stars 53 forks source link

Wrong speaker detection/ Wrong labeling of speaker #63

Closed khemit86 closed 1 year ago

khemit86 commented 1 year ago

Wrong speaker detection/ Wrong labeling of speaker when I am trying to transcribe the mp4 video file. I am using python sdk. I am using below settings:

'tier':'enhanced', 'punctuate': True,' diarize':True,' utterances':True,' utt_split':0.3 I am attaching the expected output file, actual output file current_whatsapp.docx expected_whatsapp.docx

Can someone solve my problem?

Below is the json response of transcription:

{'metadata': {'transaction_key': 'deprecated', 'request_id': '2a124439-6333-4b0f-9ac7-0063b303e6ba', 'sha256': '2af7b928fe91cfca4b51126b54d19c69af3b5c39db65da7d9d87d01e74faf7ca', 'created': '2022-11-10T05:27:39.785Z', 'duration': 59.94669, 'channels': 1, 'models': ['125125fb-e391-458e-a227-a60d6426f5d6'], 'model_info': {'125125fb-e391-458e-a227-a60d6426f5d6': {'name': 'general-enhanced', 'version': '2022-05-18.0', 'tier': 'enhanced'}}}, 'results': {'channels': [{'alternatives': [{'transcript': "Hello, Kamiji. How are you? I'm fine. And you? Fine. So tell me about yourself. myself. I'm a software engineer. I'm working in a global IT app as project manager. Okay. And I have more than ten year experience in PHP. During my experience, I worked on PHP. So which projects are you working on? Right now I'm working on? Right now I'm working on single project. That is it. that is ASR means speech recognition on that. Which challenges are you facing? Right now, I'm facing some challenges related to the speaker changes. Like, when I transcribe the video into text format. Sometimes the speak speaker labeling are wrong. Okay. I'm disconnecting the", 'confidence': 0.94970703, 'words': [{'word': 'hello', 'start': 1.1992188, 'end': 1.5195312, 'confidence': 0.75268555, 'speaker': 0, 'speaker_confidence': 0.06817669, 'punctuated_word': 'Hello,'}, {'word': 'kamiji', 'start': 1.5195312, 'end': 1.9990234, 'confidence': 0.6906738, 'speaker': 0, 'speaker_confidence': 0.06817669, 'punctuated_word': 'Kamiji.'}, {'word': 'how', 'start': 1.9990234, 'end': 2.1582031, 'confidence': 0.9946289, 'speaker': 0, 'speaker_confidence': 0.06817669, 'punctuated_word': 'How'}, {'word': 'are', 'start': 2.1582031, 'end': 2.3183594, 'confidence': 0.98095703, 'speaker': 0, 'speaker_confidence': 0.06817669, 'punctuated_word': 'are'}, {'word': 'you', 'start': 2.3183594, 'end': 2.8183594, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.06817669, 'punctuated_word': 'you?'}, {'word': "i'm", 'start': 3.0390625, 'end': 3.2773438, 'confidence': 0.79833984, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': "I'm"}, {'word': 'fine', 'start': 3.2773438, 'end': 3.5976562, 'confidence': 0.78344727, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'fine.'}, {'word': 'and', 'start': 3.5976562, 'end': 3.9179688, 'confidence': 0.5410156, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'And'}, {'word': 'you', 'start': 3.9179688, 'end': 4.4179688, 'confidence': 0.8601074, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'you?'}, {'word': 'fine', 'start': 4.5585938, 'end': 5.0585938, 'confidence': 0.9968262, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'Fine.'}, {'word': 'so', 'start': 5.3554688, 'end': 5.5976562, 'confidence': 0.9975586, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'So'}, {'word': 'tell', 'start': 5.5976562, 'end': 5.8359375, 'confidence': 0.93359375, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'tell'}, {'word': 'me', 'start': 5.8359375, 'end': 5.9960938, 'confidence': 0.99902344, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'me'}, {'word': 'about', 'start': 5.9960938, 'end': 6.2382812, 'confidence': 0.9980469, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'about'}, {'word': 'yourself', 'start': 6.2382812, 'end': 6.7382812, 'confidence': 0.7758789, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'yourself.'}, {'word': 'myself', 'start': 7.6835938, 'end': 8.183594, 'confidence': 0.61417645, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'myself.'}, {'word': "i'm", 'start': 8.8828125, 'end': 9.0859375, 'confidence': 0.85961914, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': "I'm"}, {'word': 'a', 'start': 9.0859375, 'end': 9.203125, 'confidence': 0.99609375, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'a'}, {'word': 'software', 'start': 9.203125, 'end': 9.640625, 'confidence': 0.99853516, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'software'}, {'word': 'engineer', 'start': 9.640625, 'end': 10.140625, 'confidence': 0.9729004, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'engineer.'}, {'word': "i'm", 'start': 10.2421875, 'end': 10.484375, 'confidence': 0.8195801, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': "I'm"}, {'word': 'working', 'start': 10.484375, 'end': 10.84375, 'confidence': 0.94970703, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'working'}, {'word': 'in', 'start': 10.84375, 'end': 11.125, 'confidence': 0.99609375, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'in'}, {'word': 'a', 'start': 11.125, 'end': 11.203125, 'confidence': 0.6328125, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'a'}, {'word': 'global', 'start': 11.203125, 'end': 11.640625, 'confidence': 0.98828125, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'global'}, {'word': 'it', 'start': 11.640625, 'end': 12.0, 'confidence': 0.99853516, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'IT'}, {'word': 'app', 'start': 12.0, 'end': 12.203125, 'confidence': 0.5722656, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'app'}, {'word': 'as', 'start': 12.203125, 'end': 12.703125, 'confidence': 0.9086914, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'as'}, {'word': 'project', 'start': 13.640625, 'end': 14.0, 'confidence': 0.6201172, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'project'}, {'word': 'manager', 'start': 14.0, 'end': 14.5, 'confidence': 0.9519043, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'manager.'}, {'word': 'okay', 'start': 15.2265625, 'end': 15.546875, 'confidence': 0.7788086, 'speaker': 0, 'speaker_confidence': 0.041602314, 'punctuated_word': 'Okay.'}, {'word': 'and', 'start': 15.546875, 'end': 16.046875, 'confidence': 0.79345703, 'speaker': 0, 'speaker_confidence': 0.041602314, 'punctuated_word': 'And'}, {'word': 'i', 'start': 17.09375, 'end': 17.21875, 'confidence': 0.95654297, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'I'}, {'word': 'have', 'start': 17.21875, 'end': 17.46875, 'confidence': 0.99560547, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'have'}, {'word': 'more', 'start': 17.46875, 'end': 17.65625, 'confidence': 0.99853516, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'more'}, {'word': 'than', 'start': 17.65625, 'end': 17.90625, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'than'}, {'word': 'ten', 'start': 17.90625, 'end': 18.09375, 'confidence': 0.9941406, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'ten'}, {'word': 'year', 'start': 18.09375, 'end': 18.34375, 'confidence': 0.9902344, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'year'}, {'word': 'experience', 'start': 18.34375, 'end': 18.84375, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'experience'}, {'word': 'in', 'start': 18.90625, 'end': 19.140625, 'confidence': 0.99853516, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'in'}, {'word': 'php', 'start': 19.140625, 'end': 19.640625, 'confidence': 0.90063477, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'PHP.'}, {'word': 'during', 'start': 20.421875, 'end': 20.703125, 'confidence': 0.9951172, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'During'}, {'word': 'my', 'start': 20.703125, 'end': 20.90625, 'confidence': 0.9506836, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'my'}, {'word': 'experience', 'start': 20.90625, 'end': 21.40625, 'confidence': 0.96850586, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'experience,'}, {'word': 'i', 'start': 21.46875, 'end': 21.625, 'confidence': 0.9975586, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'I'}, {'word': 'worked', 'start': 21.625, 'end': 21.90625, 'confidence': 0.4501953, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'worked'}, {'word': 'on', 'start': 21.90625, 'end': 22.0625, 'confidence': 0.9453125, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'on'}, {'word': 'php', 'start': 22.0625, 'end': 22.5625, 'confidence': 0.28125, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'PHP.'}, {'word': 'so', 'start': 25.96875, 'end': 26.140625, 'confidence': 0.13635254, 'speaker': 0, 'speaker_confidence': 0.019748092, 'punctuated_word': 'So'}, {'word': 'which', 'start': 26.140625, 'end': 26.328125, 'confidence': 0.9736328, 'speaker': 0, 'speaker_confidence': 0.019748092, 'punctuated_word': 'which'}, {'word': 'projects', 'start': 26.328125, 'end': 26.78125, 'confidence': 0.9975586, 'speaker': 0, 'speaker_confidence': 0.0, 'punctuated_word': 'projects'}, {'word': 'are', 'start': 26.78125, 'end': 26.890625, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.0, 'punctuated_word': 'are'}, {'word': 'you', 'start': 26.890625, 'end': 27.015625, 'confidence': 0.9824219, 'speaker': 0, 'speaker_confidence': 0.0, 'punctuated_word': 'you'}, {'word': 'working', 'start': 27.015625, 'end': 27.296875, 'confidence': 0.9975586, 'speaker': 0, 'speaker_confidence': 0.0, 'punctuated_word': 'working'}, {'word': 'on', 'start': 27.296875, 'end': 27.796875, 'confidence': 0.8391113, 'speaker': 0, 'speaker_confidence': 0.0, 'punctuated_word': 'on?'}, {'word': 'right', 'start': 28.375, 'end': 28.65625, 'confidence': 0.86621094, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'Right'}, {'word': 'now', 'start': 28.65625, 'end': 28.703125, 'confidence': 0.99316406, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'now'}, {'word': "i'm", 'start': 28.703125, 'end': 28.75, 'confidence': 0.77368164, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': "I'm"}, {'word': 'working', 'start': 28.75, 'end': 28.796875, 'confidence': 0.99853516, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'working'}, {'word': 'on', 'start': 28.796875, 'end': 28.84375, 'confidence': 0.75878906, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'on?'}, {'word': 'right', 'start': 28.84375, 'end': 28.890625, 'confidence': 0.62402344, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'Right'}, {'word': 'now', 'start': 28.890625, 'end': 28.9375, 'confidence': 0.99121094, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'now'}, {'word': "i'm", 'start': 28.9375, 'end': 29.171875, 'confidence': 0.76293945, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': "I'm"}, {'word': 'working', 'start': 29.171875, 'end': 29.53125, 'confidence': 0.99902344, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'working'}, {'word': 'on', 'start': 29.53125, 'end': 30.03125, 'confidence': 0.99902344, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'on'}, {'word': 'single', 'start': 30.65625, 'end': 31.015625, 'confidence': 0.74316406, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'single'}, {'word': 'project', 'start': 31.015625, 'end': 31.5, 'confidence': 0.7504883, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'project.'}, {'word': 'that', 'start': 31.5, 'end': 31.65625, 'confidence': 0.8725586, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'That'}, {'word': 'is', 'start': 31.65625, 'end': 31.84375, 'confidence': 0.95654297, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'is'}, {'word': 'it', 'start': 31.84375, 'end': 32.34375, 'confidence': 0.8676758, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'it.'}, {'word': 'that', 'start': 32.96875, 'end': 33.21875, 'confidence': 0.38134766, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'that'}, {'word': 'is', 'start': 33.21875, 'end': 33.71875, 'confidence': 0.89990234, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'is'}, {'word': 'asr', 'start': 34.0625, 'end': 34.5625, 'confidence': 0.87939453, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'ASR'}, {'word': 'means', 'start': 35.4375, 'end': 35.9375, 'confidence': 0.6923828, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'means'}, {'word': 'speech', 'start': 35.9375, 'end': 36.21875, 'confidence': 0.52246094, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'speech'}, {'word': 'recognition', 'start': 36.21875, 'end': 36.71875, 'confidence': 0.92529297, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'recognition'}, {'word': 'on', 'start': 37.71875, 'end': 37.84375, 'confidence': 0.5708008, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'on'}, {'word': 'that', 'start': 37.84375, 'end': 38.34375, 'confidence': 0.9433594, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'that.'}, {'word': 'which', 'start': 38.625, 'end': 38.875, 'confidence': 0.9770508, 'speaker': 0, 'speaker_confidence': 0.0, 'punctuated_word': 'Which'}, {'word': 'challenges', 'start': 38.875, 'end': 39.34375, 'confidence': 0.8046875, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'challenges'}, {'word': 'are', 'start': 39.34375, 'end': 39.5, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'are'}, {'word': 'you', 'start': 39.5, 'end': 39.625, 'confidence': 0.99902344, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'you'}, {'word': 'facing', 'start': 39.625, 'end': 40.125, 'confidence': 0.99658203, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'facing?'}, {'word': 'right', 'start': 41.71875, 'end': 41.96875, 'confidence': 0.63623047, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'Right'}, {'word': 'now', 'start': 41.96875, 'end': 42.15625, 'confidence': 0.81274414, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'now,'}, {'word': "i'm", 'start': 42.15625, 'end': 42.5625, 'confidence': 0.98950195, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': "I'm"}, {'word': 'facing', 'start': 42.5625, 'end': 43.0, 'confidence': 0.9868164, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'facing'}, {'word': 'some', 'start': 43.0, 'end': 43.25, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'some'}, {'word': 'challenges', 'start': 43.25, 'end': 43.75, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'challenges'}, {'word': 'related', 'start': 44.03125, 'end': 44.53125, 'confidence': 0.98583984, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'related'}, {'word': 'to', 'start': 44.53125, 'end': 44.625, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'to'}, {'word': 'the', 'start': 44.625, 'end': 44.84375, 'confidence': 0.9716797, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'the'}, {'word': 'speaker', 'start': 44.84375, 'end': 45.1875, 'confidence': 0.9902344, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'speaker'}, {'word': 'changes', 'start': 45.1875, 'end': 45.6875, 'confidence': 0.9050293, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'changes.'}, {'word': 'like', 'start': 46.34375, 'end': 46.84375, 'confidence': 0.9001465, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'Like,'}, {'word': 'when', 'start': 47.0625, 'end': 47.3125, 'confidence': 0.99902344, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'when'}, {'word': 'i', 'start': 47.3125, 'end': 47.8125, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'I'}, {'word': 'transcribe', 'start': 48.5625, 'end': 49.0625, 'confidence': 0.9313965, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'transcribe'}, {'word': 'the', 'start': 49.3125, 'end': 49.56, 'confidence': 0.9580078, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'the'}, {'word': 'video', 'start': 49.8125, 'end': 50.3125, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'video'}, {'word': 'into', 'start': 50.59375, 'end': 51.0, 'confidence': 0.91308594, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'into'}, {'word': 'text', 'start': 51.0, 'end': 51.3125, 'confidence': 0.7885742, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'text'}, {'word': 'format', 'start': 51.3125, 'end': 51.8125, 'confidence': 0.78125, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'format.'}, {'word': 'sometimes', 'start': 52.6875, 'end': 53.1875, 'confidence': 0.9589844, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'Sometimes'}, {'word': 'the', 'start': 53.1875, 'end': 53.375, 'confidence': 0.49975586, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'the'}, {'word': 'speak', 'start': 53.375, 'end': 53.625, 'confidence': 0.84521484, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'speak'}, {'word': 'speaker', 'start': 53.84375, 'end': 54.25, 'confidence': 0.71191406, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'speaker'}, {'word': 'labeling', 'start': 54.25, 'end': 54.71875, 'confidence': 0.9682617, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'labeling'}, {'word': 'are', 'start': 54.71875, 'end': 54.96875, 'confidence': 0.48291016, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'are'}, {'word': 'wrong', 'start': 54.96875, 'end': 55.46875, 'confidence': 0.9030762, 'speaker': 0, 'speaker_confidence': 0.098423064, 'punctuated_word': 'wrong.'}, {'word': 'okay', 'start': 56.40625, 'end': 56.90625, 'confidence': 0.77368164, 'speaker': 0, 'speaker_confidence': 0.098423064, 'punctuated_word': 'Okay.'}, {'word': "i'm", 'start': 58.0625, 'end': 58.40625, 'confidence': 0.95996094, 'speaker': 0, 'speaker_confidence': 0.098423064, 'punctuated_word': "I'm"}, {'word': 'disconnecting', 'start': 58.40625, 'end': 58.90625, 'confidence': 0.99902344, 'speaker': 0, 'speaker_confidence': 0.098423064, 'punctuated_word': 'disconnecting'}, {'word': 'the', 'start': 59.1875, 'end': 59.6875, 'confidence': 0.88964844, 'speaker': 0, 'speaker_confidence': 0.098423064, 'punctuated_word': 'the'}]}]}], 'utterances': [{'start': 1.1992188, 'end': 6.7382812, 'confidence': 0.87348634, 'channel': 0, 'transcript': "Hello, Kamiji. How are you? I'm fine. And you? Fine. So tell me about yourself.", 'words': [{'word': 'hello', 'start': 1.1992188, 'end': 1.5195312, 'confidence': 0.75268555, 'speaker': 0, 'speaker_confidence': 0.06817669, 'punctuated_word': 'Hello,'}, {'word': 'kamiji', 'start': 1.5195312, 'end': 1.9990234, 'confidence': 0.6906738, 'speaker': 0, 'speaker_confidence': 0.06817669, 'punctuated_word': 'Kamiji.'}, {'word': 'how', 'start': 1.9990234, 'end': 2.1582031, 'confidence': 0.9946289, 'speaker': 0, 'speaker_confidence': 0.06817669, 'punctuated_word': 'How'}, {'word': 'are', 'start': 2.1582031, 'end': 2.3183594, 'confidence': 0.98095703, 'speaker': 0, 'speaker_confidence': 0.06817669, 'punctuated_word': 'are'}, {'word': 'you', 'start': 2.3183594, 'end': 2.8183594, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.06817669, 'punctuated_word': 'you?'}, {'word': "i'm", 'start': 3.0390625, 'end': 3.2773438, 'confidence': 0.79833984, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': "I'm"}, {'word': 'fine', 'start': 3.2773438, 'end': 3.5976562, 'confidence': 0.78344727, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'fine.'}, {'word': 'and', 'start': 3.5976562, 'end': 3.9179688, 'confidence': 0.5410156, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'And'}, {'word': 'you', 'start': 3.9179688, 'end': 4.4179688, 'confidence': 0.8601074, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'you?'}, {'word': 'fine', 'start': 4.5585938, 'end': 5.0585938, 'confidence': 0.9968262, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'Fine.'}, {'word': 'so', 'start': 5.3554688, 'end': 5.5976562, 'confidence': 0.9975586, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'So'}, {'word': 'tell', 'start': 5.5976562, 'end': 5.8359375, 'confidence': 0.93359375, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'tell'}, {'word': 'me', 'start': 5.8359375, 'end': 5.9960938, 'confidence': 0.99902344, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'me'}, {'word': 'about', 'start': 5.9960938, 'end': 6.2382812, 'confidence': 0.9980469, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'about'}, {'word': 'yourself', 'start': 6.2382812, 'end': 6.7382812, 'confidence': 0.7758789, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'yourself.'}], 'speaker': 0, 'id': 'efde132f-d302-4900-b368-901c67ad5c72'}, {'start': 7.6835938, 'end': 8.183594, 'confidence': 0.61417645, 'channel': 0, 'transcript': 'myself.', 'words': [{'word': 'myself', 'start': 7.6835938, 'end': 8.183594, 'confidence': 0.61417645, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'myself.'}], 'speaker': 0, 'id': '61f3c5d5-e229-4520-9e0f-f4abf924173c'}, {'start': 8.8828125, 'end': 12.703125, 'confidence': 0.89109296, 'channel': 0, 'transcript': "I'm a software engineer. I'm working in a global IT app as", 'words': [{'word': "i'm", 'start': 8.8828125, 'end': 9.0859375, 'confidence': 0.85961914, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': "I'm"}, {'word': 'a', 'start': 9.0859375, 'end': 9.203125, 'confidence': 0.99609375, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'a'}, {'word': 'software', 'start': 9.203125, 'end': 9.640625, 'confidence': 0.99853516, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'software'}, {'word': 'engineer', 'start': 9.640625, 'end': 10.140625, 'confidence': 0.9729004, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'engineer.'}, {'word': "i'm", 'start': 10.2421875, 'end': 10.484375, 'confidence': 0.8195801, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': "I'm"}, {'word': 'working', 'start': 10.484375, 'end': 10.84375, 'confidence': 0.94970703, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'working'}, {'word': 'in', 'start': 10.84375, 'end': 11.125, 'confidence': 0.99609375, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'in'}, {'word': 'a', 'start': 11.125, 'end': 11.203125, 'confidence': 0.6328125, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'a'}, {'word': 'global', 'start': 11.203125, 'end': 11.640625, 'confidence': 0.98828125, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'global'}, {'word': 'it', 'start': 11.640625, 'end': 12.0, 'confidence': 0.99853516, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'IT'}, {'word': 'app', 'start': 12.0, 'end': 12.203125, 'confidence': 0.5722656, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'app'}, {'word': 'as', 'start': 12.203125, 'end': 12.703125, 'confidence': 0.9086914, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'as'}], 'speaker': 0, 'id': '7030b852-6e2b-4271-be1f-c000668762b0'}, {'start': 13.640625, 'end': 14.5, 'confidence': 0.78601074, 'channel': 0, 'transcript': 'project manager.', 'words': [{'word': 'project', 'start': 13.640625, 'end': 14.0, 'confidence': 0.6201172, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'project'}, {'word': 'manager', 'start': 14.0, 'end': 14.5, 'confidence': 0.9519043, 'speaker': 0, 'speaker_confidence': 0.64976156, 'punctuated_word': 'manager.'}], 'speaker': 0, 'id': '0433d626-1ab4-4e51-a7a0-e346801b2720'}, {'start': 15.2265625, 'end': 16.046875, 'confidence': 0.7861328, 'channel': 0, 'transcript': 'Okay. And', 'words': [{'word': 'okay', 'start': 15.2265625, 'end': 15.546875, 'confidence': 0.7788086, 'speaker': 0, 'speaker_confidence': 0.041602314, 'punctuated_word': 'Okay.'}, {'word': 'and', 'start': 15.546875, 'end': 16.046875, 'confidence': 0.79345703, 'speaker': 0, 'speaker_confidence': 0.041602314, 'punctuated_word': 'And'}], 'speaker': 0, 'id': 'f9754b97-95c0-4cfe-813b-4ed088bf4df3'}, {'start': 17.09375, 'end': 19.640625, 'confidence': 0.98147243, 'channel': 0, 'transcript': 'I have more than ten year experience in PHP.', 'words': [{'word': 'i', 'start': 17.09375, 'end': 17.21875, 'confidence': 0.95654297, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'I'}, {'word': 'have', 'start': 17.21875, 'end': 17.46875, 'confidence': 0.99560547, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'have'}, {'word': 'more', 'start': 17.46875, 'end': 17.65625, 'confidence': 0.99853516, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'more'}, {'word': 'than', 'start': 17.65625, 'end': 17.90625, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'than'}, {'word': 'ten', 'start': 17.90625, 'end': 18.09375, 'confidence': 0.9941406, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'ten'}, {'word': 'year', 'start': 18.09375, 'end': 18.34375, 'confidence': 0.9902344, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'year'}, {'word': 'experience', 'start': 18.34375, 'end': 18.84375, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'experience'}, {'word': 'in', 'start': 18.90625, 'end': 19.140625, 'confidence': 0.99853516, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'in'}, {'word': 'php', 'start': 19.140625, 'end': 19.640625, 'confidence': 0.90063477, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'PHP.'}], 'speaker': 0, 'id': '2555d72a-3eb6-4b03-9417-424c452d7780'}, {'start': 20.421875, 'end': 22.5625, 'confidence': 0.7983747, 'channel': 0, 'transcript': 'During my experience, I worked on PHP.', 'words': [{'word': 'during', 'start': 20.421875, 'end': 20.703125, 'confidence': 0.9951172, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'During'}, {'word': 'my', 'start': 20.703125, 'end': 20.90625, 'confidence': 0.9506836, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'my'}, {'word': 'experience', 'start': 20.90625, 'end': 21.40625, 'confidence': 0.96850586, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'experience,'}, {'word': 'i', 'start': 21.46875, 'end': 21.625, 'confidence': 0.9975586, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'I'}, {'word': 'worked', 'start': 21.625, 'end': 21.90625, 'confidence': 0.4501953, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'worked'}, {'word': 'on', 'start': 21.90625, 'end': 22.0625, 'confidence': 0.9453125, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'on'}, {'word': 'php', 'start': 22.0625, 'end': 22.5625, 'confidence': 0.28125, 'speaker': 0, 'speaker_confidence': 0.56206894, 'punctuated_word': 'PHP.'}], 'speaker': 0, 'id': '61a1bffa-9c00-41e5-a0cb-d060f3e2d4ca'}, {'start': 25.96875, 'end': 27.796875, 'confidence': 0.8465925, 'channel': 0, 'transcript': 'So which projects are you working on?', 'words': [{'word': 'so', 'start': 25.96875, 'end': 26.140625, 'confidence': 0.13635254, 'speaker': 0, 'speaker_confidence': 0.019748092, 'punctuated_word': 'So'}, {'word': 'which', 'start': 26.140625, 'end': 26.328125, 'confidence': 0.9736328, 'speaker': 0, 'speaker_confidence': 0.019748092, 'punctuated_word': 'which'}, {'word': 'projects', 'start': 26.328125, 'end': 26.78125, 'confidence': 0.9975586, 'speaker': 0, 'speaker_confidence': 0.0, 'punctuated_word': 'projects'}, {'word': 'are', 'start': 26.78125, 'end': 26.890625, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.0, 'punctuated_word': 'are'}, {'word': 'you', 'start': 26.890625, 'end': 27.015625, 'confidence': 0.9824219, 'speaker': 0, 'speaker_confidence': 0.0, 'punctuated_word': 'you'}, {'word': 'working', 'start': 27.015625, 'end': 27.296875, 'confidence': 0.9975586, 'speaker': 0, 'speaker_confidence': 0.0, 'punctuated_word': 'working'}, {'word': 'on', 'start': 27.296875, 'end': 27.796875, 'confidence': 0.8391113, 'speaker': 0, 'speaker_confidence': 0.0, 'punctuated_word': 'on?'}], 'speaker': 0, 'id': 'f9004e9d-9cde-4a64-b032-4070a409bfb6'}, {'start': 28.375, 'end': 30.03125, 'confidence': 0.87666017, 'channel': 0, 'transcript': "Right now I'm working on? Right now I'm working on", 'words': [{'word': 'right', 'start': 28.375, 'end': 28.65625, 'confidence': 0.86621094, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'Right'}, {'word': 'now', 'start': 28.65625, 'end': 28.703125, 'confidence': 0.99316406, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'now'}, {'word': "i'm", 'start': 28.703125, 'end': 28.75, 'confidence': 0.77368164, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': "I'm"}, {'word': 'working', 'start': 28.75, 'end': 28.796875, 'confidence': 0.99853516, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'working'}, {'word': 'on', 'start': 28.796875, 'end': 28.84375, 'confidence': 0.75878906, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'on?'}, {'word': 'right', 'start': 28.84375, 'end': 28.890625, 'confidence': 0.62402344, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'Right'}, {'word': 'now', 'start': 28.890625, 'end': 28.9375, 'confidence': 0.99121094, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'now'}, {'word': "i'm", 'start': 28.9375, 'end': 29.171875, 'confidence': 0.76293945, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': "I'm"}, {'word': 'working', 'start': 29.171875, 'end': 29.53125, 'confidence': 0.99902344, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'working'}, {'word': 'on', 'start': 29.53125, 'end': 30.03125, 'confidence': 0.99902344, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'on'}], 'speaker': 0, 'id': 'f6d42af6-2873-4bac-8083-4179fcb9d676'}, {'start': 30.65625, 'end': 32.34375, 'confidence': 0.83808595, 'channel': 0, 'transcript': 'single project. That is it.', 'words': [{'word': 'single', 'start': 30.65625, 'end': 31.015625, 'confidence': 0.74316406, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'single'}, {'word': 'project', 'start': 31.015625, 'end': 31.5, 'confidence': 0.7504883, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'project.'}, {'word': 'that', 'start': 31.5, 'end': 31.65625, 'confidence': 0.8725586, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'That'}, {'word': 'is', 'start': 31.65625, 'end': 31.84375, 'confidence': 0.95654297, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'is'}, {'word': 'it', 'start': 31.84375, 'end': 32.34375, 'confidence': 0.8676758, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'it.'}], 'speaker': 0, 'id': 'dec794ef-efb7-47ac-84f7-0a69a179acf9'}, {'start': 32.96875, 'end': 33.71875, 'confidence': 0.640625, 'channel': 0, 'transcript': 'that is', 'words': [{'word': 'that', 'start': 32.96875, 'end': 33.21875, 'confidence': 0.38134766, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'that'}, {'word': 'is', 'start': 33.21875, 'end': 33.71875, 'confidence': 0.89990234, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'is'}], 'speaker': 0, 'id': '1fe6444e-bdde-4f77-9b86-39bf61f0f322'}, {'start': 34.0625, 'end': 34.5625, 'confidence': 0.87939453, 'channel': 0, 'transcript': 'ASR', 'words': [{'word': 'asr', 'start': 34.0625, 'end': 34.5625, 'confidence': 0.87939453, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'ASR'}], 'speaker': 0, 'id': '6cf98d9a-55f4-4683-9b0e-30edc887eb7a'}, {'start': 35.4375, 'end': 36.71875, 'confidence': 0.7133789, 'channel': 0, 'transcript': 'means speech recognition', 'words': [{'word': 'means', 'start': 35.4375, 'end': 35.9375, 'confidence': 0.6923828, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'means'}, {'word': 'speech', 'start': 35.9375, 'end': 36.21875, 'confidence': 0.52246094, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'speech'}, {'word': 'recognition', 'start': 36.21875, 'end': 36.71875, 'confidence': 0.92529297, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'recognition'}], 'speaker': 0, 'id': '17cb8e5d-03e2-481d-b19c-4d145e37502d'}, {'start': 37.71875, 'end': 40.125, 'confidence': 0.8987165, 'channel': 0, 'transcript': 'on that. Which challenges are you facing?', 'words': [{'word': 'on', 'start': 37.71875, 'end': 37.84375, 'confidence': 0.5708008, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'on'}, {'word': 'that', 'start': 37.84375, 'end': 38.34375, 'confidence': 0.9433594, 'speaker': 0, 'speaker_confidence': 0.61906564, 'punctuated_word': 'that.'}, {'word': 'which', 'start': 38.625, 'end': 38.875, 'confidence': 0.9770508, 'speaker': 0, 'speaker_confidence': 0.0, 'punctuated_word': 'Which'}, {'word': 'challenges', 'start': 38.875, 'end': 39.34375, 'confidence': 0.8046875, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'challenges'}, {'word': 'are', 'start': 39.34375, 'end': 39.5, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'are'}, {'word': 'you', 'start': 39.5, 'end': 39.625, 'confidence': 0.99902344, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'you'}, {'word': 'facing', 'start': 39.625, 'end': 40.125, 'confidence': 0.99658203, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'facing?'}], 'speaker': 0, 'id': '40604b98-dace-46af-b209-809840722b07'}, {'start': 41.71875, 'end': 45.6875, 'confidence': 0.9342374, 'channel': 0, 'transcript': "Right now, I'm facing some challenges related to the speaker changes.", 'words': [{'word': 'right', 'start': 41.71875, 'end': 41.96875, 'confidence': 0.63623047, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'Right'}, {'word': 'now', 'start': 41.96875, 'end': 42.15625, 'confidence': 0.81274414, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'now,'}, {'word': "i'm", 'start': 42.15625, 'end': 42.5625, 'confidence': 0.98950195, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': "I'm"}, {'word': 'facing', 'start': 42.5625, 'end': 43.0, 'confidence': 0.9868164, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'facing'}, {'word': 'some', 'start': 43.0, 'end': 43.25, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'some'}, {'word': 'challenges', 'start': 43.25, 'end': 43.75, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'challenges'}, {'word': 'related', 'start': 44.03125, 'end': 44.53125, 'confidence': 0.98583984, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'related'}, {'word': 'to', 'start': 44.53125, 'end': 44.625, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'to'}, {'word': 'the', 'start': 44.625, 'end': 44.84375, 'confidence': 0.9716797, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'the'}, {'word': 'speaker', 'start': 44.84375, 'end': 45.1875, 'confidence': 0.9902344, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'speaker'}, {'word': 'changes', 'start': 45.1875, 'end': 45.6875, 'confidence': 0.9050293, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'changes.'}], 'speaker': 0, 'id': 'a005d2b7-9517-488f-82d4-9d66d3187e6d'}, {'start': 46.34375, 'end': 47.8125, 'confidence': 0.96622723, 'channel': 0, 'transcript': 'Like, when I', 'words': [{'word': 'like', 'start': 46.34375, 'end': 46.84375, 'confidence': 0.9001465, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'Like,'}, {'word': 'when', 'start': 47.0625, 'end': 47.3125, 'confidence': 0.99902344, 'speaker': 0, 'speaker_confidence': 0.57842153, 'punctuated_word': 'when'}, {'word': 'i', 'start': 47.3125, 'end': 47.8125, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'I'}], 'speaker': 0, 'id': 'ed305b95-e17c-4da2-a11f-263615a605ad'}, {'start': 48.5625, 'end': 51.8125, 'confidence': 0.8953044, 'channel': 0, 'transcript': 'transcribe the video into text format.', 'words': [{'word': 'transcribe', 'start': 48.5625, 'end': 49.0625, 'confidence': 0.9313965, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'transcribe'}, {'word': 'the', 'start': 49.3125, 'end': 49.56, 'confidence': 0.9580078, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'the'}, {'word': 'video', 'start': 49.8125, 'end': 50.3125, 'confidence': 0.9995117, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'video'}, {'word': 'into', 'start': 50.59375, 'end': 51.0, 'confidence': 0.91308594, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'into'}, {'word': 'text', 'start': 51.0, 'end': 51.3125, 'confidence': 0.7885742, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'text'}, {'word': 'format', 'start': 51.3125, 'end': 51.8125, 'confidence': 0.78125, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'format.'}], 'speaker': 0, 'id': 'f521d234-ad31-4ea6-9443-16e74f40b1cc'}, {'start': 52.6875, 'end': 55.46875, 'confidence': 0.7671596, 'channel': 0, 'transcript': 'Sometimes the speak speaker labeling are wrong.', 'words': [{'word': 'sometimes', 'start': 52.6875, 'end': 53.1875, 'confidence': 0.9589844, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'Sometimes'}, {'word': 'the', 'start': 53.1875, 'end': 53.375, 'confidence': 0.49975586, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'the'}, {'word': 'speak', 'start': 53.375, 'end': 53.625, 'confidence': 0.84521484, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'speak'}, {'word': 'speaker', 'start': 53.84375, 'end': 54.25, 'confidence': 0.71191406, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'speaker'}, {'word': 'labeling', 'start': 54.25, 'end': 54.71875, 'confidence': 0.9682617, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'labeling'}, {'word': 'are', 'start': 54.71875, 'end': 54.96875, 'confidence': 0.48291016, 'speaker': 0, 'speaker_confidence': 0.5727463, 'punctuated_word': 'are'}, {'word': 'wrong', 'start': 54.96875, 'end': 55.46875, 'confidence': 0.9030762, 'speaker': 0, 'speaker_confidence': 0.098423064, 'punctuated_word': 'wrong.'}], 'speaker': 0, 'id': '87a5b8f7-069c-4ffd-ac46-0ac464cdc7e0'}, {'start': 56.40625, 'end': 56.90625, 'confidence': 0.77368164, 'channel': 0, 'transcript': 'Okay.', 'words': [{'word': 'okay', 'start': 56.40625, 'end': 56.90625, 'confidence': 0.77368164, 'speaker': 0, 'speaker_confidence': 0.098423064, 'punctuated_word': 'Okay.'}], 'speaker': 0, 'id': 'f70db353-b977-4cda-94f8-819dfdb53505'}, {'start': 58.0625, 'end': 59.6875, 'confidence': 0.94954425, 'channel': 0, 'transcript': "I'm disconnecting the", 'words': [{'word': "i'm", 'start': 58.0625, 'end': 58.40625, 'confidence': 0.95996094, 'speaker': 0, 'speaker_confidence': 0.098423064, 'punctuated_word': "I'm"}, {'word': 'disconnecting', 'start': 58.40625, 'end': 58.90625, 'confidence': 0.99902344, 'speaker': 0, 'speaker_confidence': 0.098423064, 'punctuated_word': 'disconnecting'}, {'word': 'the', 'start': 59.1875, 'end': 59.6875, 'confidence': 0.88964844, 'speaker': 0, 'speaker_confidence': 0.098423064, 'punctuated_word': 'the'}], 'speaker': 0, 'id': '2697a274-547d-4486-abb3-9b5e7229a40c'}]}}

geekchick commented 1 year ago

Hi, I was unavailable for a few days. Can you please send me the file you're trying to transcribe?

khemit86 commented 1 year ago

@geekchick File size is 11MB So its not uploading here. Can you please share your email Id or other way so I can share the file.

geekchick commented 1 year ago

Hi @khemit86 is there any way you can host the file online and share the link here?

geekchick commented 1 year ago

Or, take a snippet of the same video so the size isn't as large.

khemit86 commented 1 year ago

@geekchick Below the link of video file:

https://secure.missoulaboneandjoint.com/whatsapp.mp4 you can download the video from here

Thanks for response and help

khemit86 commented 1 year ago

Hello @geekchick

It's treating the new speaker as the first speaker and then adding it as part of the other persons dialogue, so it doesn't recognize change of speaker at all. I could tell the difference in speakers. It's very strange for it to be continuously identifying speaker0 but breaking up his speech. This should be adjustable.

Sometimes speaker detection also wrong, randomly breaking up the same speaker is not correct. I need also time when speaker is change

I am attaching the below files:

actual_output.docs-> document of output when transcribe the video with speaker change time.

expected_output.docs-> document of correct output

SERVICE_1_TEST1.mov-> Video file that I want to transcribe. Remote Url: https://www.globalitapp.com/SERVICE_1_TEST1.mov you can download it

I am using below settings in python sdk

'tier':'base','model':'general','punctuate': True,'diarize':True,'utterances':True,'utt_split':0.3,'max_speakers':2,'numerals':True,'detect_language':True

Please help me to sort out the problem

geekchick commented 1 year ago

Ok gotcha, I'm trying it now.

khemit86 commented 1 year ago

Hello @geekchick Have your tried? Thanks

geekchick commented 1 year ago

Hi @khemit86 curious, are you using a specific language model from this list? https://developers.deepgram.com/documentation/features/language/

geekchick commented 1 year ago

Also, I see you have detect_language: True. Can you tell me which language it's detecting? For example, you'll see a JSON response like the one below. In this example it's French, "detected_language": "fr".

{
  "metadata": {
    "transaction_key": "string",
    "request_id": "string",
    "sha256": "string",
    "created": "string",
    "duration": 0,
    "channels": 0
  },
  "results": {
    "channels": [
      {
        "alternatives":[],
        "detected_language": "fr"
      }
    ]
  }
geekchick commented 1 year ago

Hello @geekchick

It's treating the new speaker as the first speaker and then adding it as part of the other persons dialogue, so it doesn't recognize change of speaker at all. I could tell the difference in speakers. It's very strange for it to be continuously identifying speaker0 but breaking up his speech. This should be adjustable.

Sometimes speaker detection also wrong, randomly breaking up the same speaker is not correct. I need also time when speaker is change

I am attaching the below files:

actual_output.docs-> document of output when transcribe the video with speaker change time.

expected_output.docs-> document of correct output

SERVICE_1_TEST1.mov-> Video file that I want to transcribe. Remote Url: https://www.globalitapp.com/SERVICE_1_TEST1.mov you can download it

I am using below settings in python sdk

'tier':'base','model':'general','punctuate': True,'diarize':True,'utterances':True,'utt_split':0.3,'max_speakers':2,'numerals':True,'detect_language':True

Please help me to sort out the problem

I'm not seeing any attached files. I'd like to compare my output to yours.

khemit86 commented 1 year ago

Hello @geekchick language is en (English) I am attaching the files: expected_output.docx actual_output.docx

Please compare your output with my output files. expected output is correct Thanks

Thanks

khemit86 commented 1 year ago

Hello, Please find the audio file url: https://www.globalitapp.com/SERVICE_1_TEST1.mov Please download the video and review

On Tue, Nov 22, 2022 at 1:44 AM Tonya Camille @.***> wrote:

Hi @khemit86 https://github.com/khemit86 , the audio that you shared with me https://secure.missoulaboneandjoint.com/whatsapp.mp4 appears to not be the same audio in your expected_output.docx and actual_output.docx files that you've attached.

— Reply to this email directly, view it on GitHub https://github.com/deepgram/deepgram-python-sdk/issues/63#issuecomment-1322590108, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZTK3YBXTX7KEKTQCPV27LWJPJ2JANCNFSM6AAAAAAR4FIWYQ . You are receiving this because you were mentioned.Message ID: @.***>

geekchick commented 1 year ago

Ok, thanks @khemit86 ! Also, when you say the wrong labeling of speaker can you elaborate?

khemit86 commented 1 year ago

Transcription is breaking up the speakers, but not accurately. It's treating the new speaker as the first speaker and then adding it as part of the other person's dialogue, so it doesn't recognize the change of speaker at all. Thanks

jjmaldonis commented 1 year ago

Hey @khemit86 we have improved our diarization significantly in the last year and continue to do so. If you are still having issues, please post in https://github.com/orgs/deepgram/discussions