Post recent interactions with users, there has been a highlighted need for enhanced speaker differentiation within RTT's offline transcripts after it's saved in their AWS S3 bucket, etc. A suggested approach revolves around leveraging the 'uid' generated by Agora's SDK as a labeling method for individual speakers. This unique 'uid' is returned as a promise whenever a participant joins a call, making it a viable candidate for labeling.
Problem:
The current RTT system and its documentation lack clarity and guidelines on using this 'uid' for speaker differentiation. This omission could hinder clients aiming to adopt this solution and possibly impede the realization of the full potential of our RTT system.
Proposed Solution:
Technical Evaluation: The key areas of concern include its reflection within offline transcripts i.e. the uid "941847" being returned when a participant joins the call but it's not clear for our customers/developers that's actually a label generated by Agora and used as the label within the webVTT formatted transcript.
Documentation Update: The 'uid' labeling technique was found to be viable and aligns with our platform's standards. Therefore we should proceed with updating the RTT documentation. This update should provide a clear guideline on how to utilize 'uid' for labeling speakers in offline transcripts in webVTT format, ensuring ease of implementation for our users.
Action Items:
Suggestions to Update RTT documentation: Incorporating the Agora UID into the webVTT format, especially after the regionID, can provide a seamless and intuitive way for developers to differentiate speakers.
Here’s how Agora developers can make this implementation more obvious:
Integrate UID into webVTT Region Definitions:
Before any cue is written, the webVTT file usually has a series of region definitions. These can be modified to incorporate the Agora UID.
REGION
id: RegionID1_UID_941847
width: 40%
lines: 1
regionanchor: 0%,100%
viewportanchor: 10%,90%
scroll: up
Modify Cue Settings:
When introducing a new cue (or line of dialogue) in the webVTT format, developers can utilize the modified region ID.
00:11.000 --> 00:13.000 region:RegionID1_UID_941847
Hello, this is the therapist speaking.
Documentation and Commenting:
Within the webVTT file, developers can incorporate comments (using NOTE) to explain the new format to users who might be unfamiliar.
NOTE
In this transcript, speakers are differentiated using both region IDs and Agora UIDs.
The format is RegionID_UID. E.g., RegionID1_UID_941847 represents the speaker1 (therapist) with UID 941847.
Issue Description:
Background:
Post recent interactions with users, there has been a highlighted need for enhanced speaker differentiation within RTT's offline transcripts after it's saved in their AWS S3 bucket, etc. A suggested approach revolves around leveraging the 'uid' generated by Agora's SDK as a labeling method for individual speakers. This unique 'uid' is returned as a promise whenever a participant joins a call, making it a viable candidate for labeling.
Problem:
The current RTT system and its documentation lack clarity and guidelines on using this 'uid' for speaker differentiation. This omission could hinder clients aiming to adopt this solution and possibly impede the realization of the full potential of our RTT system.
Proposed Solution:
Technical Evaluation: The key areas of concern include its reflection within offline transcripts i.e. the uid "941847" being returned when a participant joins the call but it's not clear for our customers/developers that's actually a label generated by Agora and used as the label within the webVTT formatted transcript.
Documentation Update: The 'uid' labeling technique was found to be viable and aligns with our platform's standards. Therefore we should proceed with updating the RTT documentation. This update should provide a clear guideline on how to utilize 'uid' for labeling speakers in offline transcripts in webVTT format, ensuring ease of implementation for our users.
Action Items:
Suggestions to Update RTT documentation: Incorporating the Agora UID into the webVTT format, especially after the regionID, can provide a seamless and intuitive way for developers to differentiate speakers.
Here’s how Agora developers can make this implementation more obvious:
Before any cue is written, the webVTT file usually has a series of region definitions. These can be modified to incorporate the Agora UID.
When introducing a new cue (or line of dialogue) in the webVTT format, developers can utilize the modified region ID.
Within the webVTT file, developers can incorporate comments (using NOTE) to explain the new format to users who might be unfamiliar.
Labels: enhancement, documentation, RTT Assignees: Sid Sharma sid.sharma@agora.io, Iain Iain@agora.io