Disable or pass through Kaldi's --dither option for deterministic feature extraction

arlofaria commented 1 year ago

Problem MFA generates irreproducible output, due to the configuration of feature extraction in Kaldi. In particular, when applying a dither, the RNG in Kaldi is not seeded deterministically.

Proposed solution The easiest solution is to hardcode Kaldi's --dither option to a non-default value of 0.0 to disable dithering. This may be problematic for features that use energy instead of C0, in which case you'd also want to set some energy floor. If users really require dithering, they could apply it to the input waveforms themselves, e.g. using sox (in which case pass -R to ensure reproducibility).

A better solution is to pass through a dither configuration variable from MFA, similar to how some feature extraction parameters are used in training mode. A minor complication is that these parameters do not seem to be passed through for alignment mode, as they are expected to be loaded from a model's JSON configuration.

Alternative solution We could propose a change to Kaldi so that a seed could be set or passed to the RandState instantiated for dithering purposes: see feature-window.cc.

mmcauliffe commented 1 year ago

For the next version, dither along with other feature calculation parameters can get passed through.

cassiotbatista commented 1 year ago

Hi guys,

Question for (and from) the lazy: how exactly can one disable dither?

The param is not listed in the feat config opts, but I suppose I should pass a yaml file to mfa align via --config_path arg, having the following content:

dither: 0

Right?

EDIT: the above should've been mfa validate I guess, the one that calls Kaldi's MFCC extractor. Either way, I tried but it didn't work neither via yaml nor via new customization via command line. I'll paste some logs below.

$ cat dither.yaml 
features:
  dither: 0

$ head -n 2 data/local/mfa/tmp/logs/validate.log 
# mfa validate -j 8 -t data/local/mfa/tmp --config_path dither.yaml --clean --acoustic_model_path portuguese_mfa data/local/mfa/corpus portuguese_brazil_mfa --dither 0 --brackets "" 
# Started at Sun Apr 23 15:03:39 -03 2023

$ head -n 2 data/local/mfa/tmp/logs/align.log
# mfa align -j 8 --clean -t data/local/mfa/tmp --config_path dither.yaml --output_format json data/local/mfa/corpus data/local/mfa/portuguese_brazil_mfa.dict portuguese_mfa data/local/mfa/alignments --dither 0 
# Started at Sun Apr 23 15:05:13 -03 2023

$ head -n 1 data/local/mfa/tmp/corpus/corpus/split8/log/make_mfcc.1.log
/home/ctbatista/miniconda3/envs/ufpa-espnet-mfa-py39/bin/compute-mfcc-feats --use-energy=false --dither=1 --energy-floor=0 --num-ceps=13 --num-mel-bins=23 --cepstral-lifter=22 --preemphasis-coefficient=0.97 --frame-shift=10 --frame-length=25 --low-freq=20 --high-freq=7800 --sample-frequency=16000 --allow-downsample=true --allow-upsample=true --snip-edges=true ark,s,cs:- ark,t:-

$ mfa version
2.2.9

arlofaria-zoom commented 7 months ago

Oi @cassiotbatista! I think the problem is that the dither option is read from the meta.json file associated with a pretrained model, and so you'd need to modify that file as packaged for a particular model (e.g. in a ZIP archive).

Hi @mmcauliffe! Alternatively would you be open to a new feature that adds a --dither command-line option to mfa align? This would override any dither value that was used to train the model, which might be somewhat mismatched, but at least it enables deterministic behavior. (For example, it could set aligner.acoustic_model._meta["features"]["dither"], which is a bit of a monkey patch since there's no public setter for the acoustic model's MFCC options.)

Or perhaps are we misunderstanding what you meant by this comment?

For the next version, dither along with other feature calculation parameters can get passed through.

arlofaria-zoom commented 7 months ago

I've created #750 ...

MontrealCorpusTools / Montreal-Forced-Aligner

Disable or pass through Kaldi's --dither option for deterministic feature extraction #525