Closed arlofaria closed 1 year ago
For the next version, dither
along with other feature calculation parameters can get passed through.
Hi guys,
Question for (and from) the lazy: how exactly can one disable dither?
The param is not listed in the feat config opts, but I suppose I should pass a yaml file to mfa align
via --config_path
arg, having the following content:
dither: 0
Right?
EDIT: the above should've been mfa validate
I guess, the one that calls Kaldi's MFCC extractor. Either way, I tried but it didn't work neither via yaml nor via new customization via command line. I'll paste some logs below.
$ cat dither.yaml
features:
dither: 0
$ head -n 2 data/local/mfa/tmp/logs/validate.log
# mfa validate -j 8 -t data/local/mfa/tmp --config_path dither.yaml --clean --acoustic_model_path portuguese_mfa data/local/mfa/corpus portuguese_brazil_mfa --dither 0 --brackets ""
# Started at Sun Apr 23 15:03:39 -03 2023
$ head -n 2 data/local/mfa/tmp/logs/align.log
# mfa align -j 8 --clean -t data/local/mfa/tmp --config_path dither.yaml --output_format json data/local/mfa/corpus data/local/mfa/portuguese_brazil_mfa.dict portuguese_mfa data/local/mfa/alignments --dither 0
# Started at Sun Apr 23 15:05:13 -03 2023
$ head -n 1 data/local/mfa/tmp/corpus/corpus/split8/log/make_mfcc.1.log
/home/ctbatista/miniconda3/envs/ufpa-espnet-mfa-py39/bin/compute-mfcc-feats --use-energy=false --dither=1 --energy-floor=0 --num-ceps=13 --num-mel-bins=23 --cepstral-lifter=22 --preemphasis-coefficient=0.97 --frame-shift=10 --frame-length=25 --low-freq=20 --high-freq=7800 --sample-frequency=16000 --allow-downsample=true --allow-upsample=true --snip-edges=true ark,s,cs:- ark,t:-
$ mfa version
2.2.9
Oi @cassiotbatista! I think the problem is that the dither
option is read from the meta.json
file associated with a pretrained model, and so you'd need to modify that file as packaged for a particular model (e.g. in a ZIP archive).
Hi @mmcauliffe! Alternatively would you be open to a new feature that adds a --dither
command-line option to mfa align
? This would override any dither value that was used to train the model, which might be somewhat mismatched, but at least it enables deterministic behavior. (For example, it could set aligner.acoustic_model._meta["features"]["dither"]
, which is a bit of a monkey patch since there's no public setter for the acoustic model's MFCC options.)
Or perhaps are we misunderstanding what you meant by this comment?
For the next version, dither along with other feature calculation parameters can get passed through.
I've created #750 ...
Problem MFA generates irreproducible output, due to the configuration of feature extraction in Kaldi. In particular, when applying a dither, the RNG in Kaldi is not seeded deterministically.
Proposed solution The easiest solution is to hardcode Kaldi's
--dither
option to a non-default value of 0.0 to disable dithering. This may be problematic for features that use energy instead of C0, in which case you'd also want to set some energy floor. If users really require dithering, they could apply it to the input waveforms themselves, e.g. usingsox
(in which case pass-R
to ensure reproducibility).A better solution is to pass through a
dither
configuration variable from MFA, similar to how some feature extraction parameters are used in training mode. A minor complication is that these parameters do not seem to be passed through for alignment mode, as they are expected to be loaded from a model's JSON configuration.Alternative solution We could propose a change to Kaldi so that a seed could be set or passed to the
RandState
instantiated for dithering purposes: see feature-window.cc.