Closed mattpitkin closed 1 month ago
I've done some testing on the Valentini dataset using the evaluation metrics within DeepFilterNet and find very little difference between using the "fixed" normalisation and the original. Below is a plot, for each metric, of the % relative difference between metric values when using a model trained with the "fixed" (new) normalisation and the current (original) normalisation as a function of audio example.
In general, on average the "fixed" normalisation does better across all metrics except SSNR (which is the same on average). Although the difference is obviously rather marginal.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
I have noticed that the normalisation of the complex spectrogram features for the deep filtering is not doing what is expected (as described in, say, equation 12 of https://ieeexplore.ieee.org/document/9855850). In the
band_unit_norm
andband_unit_norm_t
functions in lib.rs, the estimates of the mean of the absolute values of the spectrogram (i.e., estimates of the standard deviation of the spectrogram) are square rooted before being used for normalisation, but I don't think the square root should be applied (it's not the variance that is being estimated).I've tested the spectrograms with and without the square root on the
noisy_snr0.wav
file. With the square rooting (i.e., the current code), I get:The spectrogram is not really unit normalised.
Whereas, if I "fix" the normalisations by removing the square root, and repeat the same thing, I get:
and the number are close to 1.
In practice, I've trained a model with the "fix" and it doesn't seem to make a noticeable difference (although admittedly I've not done a throughout set of tests).