When exporting a transcript of a conversation in Teams as a .vtt file some 'voice' metadata containing the speaker's screen name is present for each caption.
e.g.
WEBVTT
00:00:00.000 --> 00:00:00.800
<v Lisa Simpson>Knock knock</v>
00:00:02.100 --> 00:00:06.500
<v Homer Simpson>Who's there?</v>
00:00:10.530 --> 00:00:11.090
<v Lisa Simpson>Atish</v>
When I use webvtt to convert these captions to jsonl for analysis I'd like to preserve this metadata for context.
When exporting a transcript of a conversation in Teams as a .vtt file some 'voice' metadata containing the speaker's screen name is present for each caption.
e.g.
When I use webvtt to convert these captions to jsonl for analysis I'd like to preserve this metadata for context.
current output:
desired output:
Sample code: