bootphon / abkhazia

ABX and kaldi experiments on speech corpora made easy
https://docs.cognitive-ml.fr/abkhazia
GNU General Public License v3.0
31 stars 6 forks source link

roll over word in word timestamp alignment #17

Closed jubenjum closed 4 years ago

jubenjum commented 5 years ago

doing word alignments I found that the last timestamp from an utterance is the same than the first timestamp of the next utterance, for example:

1769-143485-0006 14.0675 14.1675 to 1769-143485-0006 14.1675 14.4675 all 1769-143485-0006 14.4675 1.0975 animals 004_F_01_07_01 1.0975 1.1975 the 004_F_01_07_01 1.1975 1.6875 villagers 004_F_01_07_01 1.6875 2.1375 gather

"animals" form utterance "1769-143485-0006" has an ending timestamp of 1.0975 and "the" in the utterance "004_F_01_07_01" has a the same starting timestamp.

the abkhazia command that produce that but is:

abkhazia align --recipe --force --verbose  --words-only "english" -o alignment_words  -l language -a am_trisa -f features --force
mmmaat commented 5 years ago

Thanks Juan, actually this is even worst! There is an offset of the aligned words within an utterance when noise is present... The bug is in align::Alignment::_export_phones_and_words()

s0102a-sent17 3.1975 3.2275 ah i s0102a-sent17 3.2275 3.3875 SIL s0102a-sent17 3.3875 3.5075 NSN s0102a-sent17 3.5075 3.7175 r SIL s0102a-sent17 3.7175 3.8475 iy NOISE s0102a-sent17 3.8475 4.2075 k recall s0102a-sent17 4.2075 4.3575 ao s0102a-sent17 4.3575 4.4075 l s0102a-sent17 4.4075 4.4375 m s0102a-sent17 4.4375 4.4875 ih s0102a-sent17 4.4875 5.1975 s missing s0102a-sent17 5.1975 6.1675 iy s0102a-sent17 6.1675 6.1975 n

mmmaat commented 4 years ago

Fixed in https://github.com/bootphon/abkhazia/commit/88e0ca5000582a086b16e796adaa56ddd50be625 and https://github.com/bootphon/abkhazia/commit/dcef08c5d72d386f4ca706237115a06ed67d6c72.