clamsproject / aapb-annotations

Repository to store manual annotation dataset developed for CLAMS-AAPB collaboration
3 stars 0 forks source link

gold format for speaker diarization #96

Open keighrim opened 1 month ago

keighrim commented 1 month ago

New Feature Summary

It seems that we could relatively easily create some gold evaluation data for SD problem by combining the time-sync annotation and the speaker turn markers in our "gold" transcript files.

Related

There's the "cleaner" code that removes the speaker markers (https://github.com/clamsproject/clams-utils/issues/2), and we should be able to "reverse" the functionality to obtain the speaker markers, to associate with the time frames for series of their utterances.

Alternatives

No response

Additional context

No response

keighrim commented 1 month ago

Since the text segmentation (lines) used in the sync annotation data doesn't exactly match the speaker turns, we need some additional steps to decide the turn boundaries within the text lines where two speakers' utterances are mixed/overlapping. A few ideas;

  1. use the majority speaker as "the" speaker. e.g., [ A;[you jim] B:[I am good] ] << mark the whole as B (B's token is more than A's token) For 50-50 situation? Could do some arbitrary assigned, like total random assignment, always the first, etc.
  2. use the number of token to divide the total time duration (e.g., A spoke 2 tokens, B spoke 2 tokens, total annotation is 1s-3s for those 4 tokens << A: 1-2s, B: 2-3s
  3. actually run some forced alignment algorithm to find the best model prediction and use it as "silver"
  4. use FA algorithm, manually review the results and make them fully "gold"