Open lv-gh opened 2 months ago
@lv-gh - thank you.
The lattice format is our internal format, and it is not documented, sorry about that.
The format of the word line is: is it the main flow
, from
, to
, word
. The lattice with overlapping time in the main flow is not valid.
Is it a bug with fix.lattice.time
? - NO
Could fix.lattice.time
work more correctly and return failure in your case? - YES
No need for failure indeed, but overlapping (overlapped speech) is pretty valid/common using pyannote diarization pipelines (and possibly others), so ideally it should just deal with that, inserting silence blocks/intervals from the farthest word read so far, not from the last (overlapped) block word.
Things to add to enhancement; now also fix.segments are incompatible with overlapped speech (segments gets truncated and under certain conditions badly), i.e. currently it's incompatible with pyannote diarization (at least it should be documented, IMHO), which produces overlapping segments.
lattice.txt:
$ fix.lattice.time -l 25 lattice.txt > lattice.fix.txt
lattice.fix.txt: