MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.26k stars 242 forks source link

Add --disable_dither option to align subcommand (workaround for #525) #750

Closed arlofaria-zoom closed 4 months ago

arlofaria-zoom commented 5 months ago

This isn't the ideal solution, as it's accessing a private attribute ._meta, but it's a quick workaround.

mmcauliffe commented 4 months ago

So it's a little more complicated to get deterministic features than just turning off dither, as another source of variability is the lossy compression that's done on feature archives, which I had to disable along with dither for deterministic feature testing here: https://github.com/mmcauliffe/kalpy/blob/main/tests/test_mfcc.py#L122.

With that said, it'd be helpful for me to understand why this is necessary in the first place, what's the use case that requires deterministic features? In general the dithering/compression shouldn't affect alignments, or at least I didn't observe a big effect for the kalpy testing.

arlofaria-zoom commented 4 months ago

Thanks for the reply!

In general the dithering/compression shouldn't affect alignments, or at least I didn't observe a big effect for the kalpy testing.

The dithering absolutely has an effect, which can be verified objectively. Whether it’s considered “big” is subjective.

I don’t see why the lossy compression should matter, though, so long as it’s deterministic. What randomness is involved in that?

I agree: achieving determinism is complicated, and disabling dithering isn’t even the right approach. (A better solution is to tweak Kaldi so that the RNG can be re-seeded.)

There are other sources of non-determinism to consider, such as in the math library. For example, you may need to set MKL_CBWR=COMPATIBLE to get truly bitwise reproducibility on a given CPU architecture.

That said, there are many use cases that would benefit from determinism: probably the most basic of these is “science”. It’s a general aim for experiments to have reproducible results. It can be really frustrating when running the same experiment twice gives different results, which sometimes lead to different conclusions. Another use case is continuous integration testing pipelines: you might compare a system output against an expected result and fail the CI check if there are any differences.

Much more specifically: the feature-level dithering was causing differences in alignments that affected WER scoring using NIST’s SCTK software for scoring ASR systems. The reason is that SCTK is quite strict about scoring hypothesis words that are strictly timestamped to be within the time intervals of a reference’s utterance-level segmentation. Disabling the dither has now resulted in deterministically reproducible results. This has in turn given a team of software developers and researchers peace of mind that they are testing the same systems and getting expected results.

Hope that makes sense! :)

mmcauliffe commented 4 months ago

Fixed in #761

arlofaria-zoom commented 4 months ago

Many thanks for the fix, @mmcauliffe !!! 👍