Closed jubenjum closed 4 years ago
Thanks Juan, actually this is even worst! There is an offset of the aligned words within an utterance when noise is present... The bug is in align::Alignment::_export_phones_and_words()
s0102a-sent17 3.1975 3.2275 ah i s0102a-sent17 3.2275 3.3875 SIL s0102a-sent17 3.3875 3.5075 NSN s0102a-sent17 3.5075 3.7175 r SIL s0102a-sent17 3.7175 3.8475 iy NOISE s0102a-sent17 3.8475 4.2075 k recall s0102a-sent17 4.2075 4.3575 ao s0102a-sent17 4.3575 4.4075 l s0102a-sent17 4.4075 4.4375 m s0102a-sent17 4.4375 4.4875 ih s0102a-sent17 4.4875 5.1975 s missing s0102a-sent17 5.1975 6.1675 iy s0102a-sent17 6.1675 6.1975 n
doing word alignments I found that the last timestamp from an utterance is the same than the first timestamp of the next utterance, for example:
1769-143485-0006 14.0675 14.1675 to 1769-143485-0006 14.1675 14.4675 all 1769-143485-0006 14.4675 1.0975 animals 004_F_01_07_01 1.0975 1.1975 the 004_F_01_07_01 1.1975 1.6875 villagers 004_F_01_07_01 1.6875 2.1375 gather
"animals" form utterance "1769-143485-0006" has an ending timestamp of 1.0975 and "the" in the utterance "004_F_01_07_01" has a the same starting timestamp.
the abkhazia command that produce that but is: