gold format for speaker diarization

clamsproject / aapb-annotations

Repository to store manual annotation dataset developed for CLAMS-AAPB collaboration

3 stars 0 forks source link

Since the text segmentation (lines) used in the sync annotation data doesn't exactly match the speaker turns, we need some additional steps to decide the turn boundaries within the text lines where two speakers' utterances are mixed/overlapping. A few ideas;

use the majority speaker as "the" speaker. e.g., [ A;[you jim] B:[I am good] ] << mark the whole as B (B's token is more than A's token) For 50-50 situation? Could do some arbitrary assigned, like total random assignment, always the first, etc.
use the number of token to divide the total time duration (e.g., A spoke 2 tokens, B spoke 2 tokens, total annotation is 1s-3s for those 4 tokens << A: 1-2s, B: 2-3s
actually run some forced alignment algorithm to find the best model prediction and use it as "silver"
use FA algorithm, manually review the results and make them fully "gold"

clamsproject / aapb-annotations

gold format for speaker diarization #96

New Feature Summary

Related

Alternatives

Additional context