facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation
Other
10.51k stars 1.02k forks source link

Why the extract_alignment function does not return times? #440

Open fishfree opened 2 months ago

fishfree commented 2 months ago

In the tutorial notebook,
alignment_durations, _, tokenized_text_tokens = alignment_extractor.extract_alignment("LJ037-0171_sr16k.wav", en_transcription, plot=True, add_trailing_silence=False) The alignment_durations, _are both a 1*160 dimensions of matrix: [[x,x,x,x....x]]. I expected they were sth like SRT/VTT subtitles with start and end times data.

Why this way? How to convert them into start-end-time data?