Why the extract_alignment function does not return times?

In the tutorial notebook,
alignment_durations, _, tokenized_text_tokens = alignment_extractor.extract_alignment("LJ037-0171_sr16k.wav", en_transcription, plot=True, add_trailing_silence=False) The alignment_durations, _are both a 1*160 dimensions of matrix: [[x,x,x,x....x]]. I expected they were sth like SRT/VTT subtitles with start and end times data.

Why this way? How to convert them into start-end-time data?

facebookresearch / seamless_communication

Why the extract_alignment function does not return times? #440