Roughness ECMA (2nd Ed, 2022) implementation validation

fchirono commented 7 months ago

Initial attempt at implementing Roughness following ECMA 418-2 (2nd Ed, 2022) standard did not produce expected results.

Roughness values from roughness_ecma (blue circles) are are overestimated for synthetic sounds (Amplitude-Modulated sine waves) and do not match the reference values (dashed grey line) in Fastl & Zwicker "Psychoacoustics" nor Roughness calculation by Daniel & Weber (roughness_dw - yellow squares, already implemented in MoSQITo).

[Validation script and figures available on MoSQITo\validations\sq_metrics\roughness_ecma ]

fchirono commented 7 months ago

@wantysal : sorry for tagging you here. I forked your Roughness ECMA implementation but I can't get it to run; would you like to enable Issues on your fork so we can discuss it over there?

I'm happy to have us discussing my implementation over here. Do let me know if/when you have a chance to run it.

wantysal commented 7 months ago

Hi Fabio,

I opened an issue on my fork, with some validation plots aswell. I tried to output the same plots as yours so it is easier to compare ;) I have succesfully ran your version of the code ! I've already seen some steps where we chose different implementation strategies. I will run both code simultenaously to compare the intermediate results step-by-step. I guess this way we'll be able to discuss the critical points in order to find the best version we can !

Tks, Salomé

fchirono commented 7 months ago

That's great, thank you for looking into it! Your results are looking pretty good, I'm confident we'll be able to figure this out!

Please let me know if you have any questions, or if you'd like me to look into something.

wantysal commented 6 months ago

Hi Fabio,

I've compared our 2 implementations from A to Z, and a few differences appeared :

The main difference is that I used np.fft.rfft to compute the dft while you used np.fft.fft. When using np.fft.fft, an amplitude correction of 2/sqrt(2) has to be applied to get the correct SPL spectrum.
in the file _env_noise_reduction, when you average the first and last windows, I think you should divide by 2 and not 3.
in the file _est_fund_mod_rate, I don't understand what is your frequency axis _f_pihat. According to the equations 91 to 95, the frequency used to apply the low modulation rates weighting is the fundamental modulation rate you estimated before. Only 1 frequency is used in equation 95, but you use 2 different frequencies. In _weight_low_mod_rates, I think you should call _weight_factor_G(f_p_imax, f_max, q1_low, q2_low)

With these modifications, I hope your results will improve !

Salomé

fchirono commented 6 months ago

Hi Salomé, thank you for the feedback. Regarding the differences:

My understanding of the ECMA standard (Section 7.1.3) was that a conventional DFT of 512 points was required, even though all operations in Sections 7.1.4 and 7.1.5 are performed only on the first 256 points of the resulting spectrum. It does make sense that an amplitude correction for single-sided spectrum is required, although this is not specified anywhere in the standard (that I could see)!
This is a good point too, although I imagine this might make little difference in the final result. I'll adopt your suggestion though, it does make sense.
This section of the standard (7.1.5) is the one I found the most confusing. The way I see it, the variable f_pi_hat is a Numpy array containing the frequencies of the most prominent peaks in the power spectrum that are considered part of the same envelope, whereas f_p_imax (denoted as "_f_p (imax)" in the standard, just below Eq. 91) is the estimated fundamental rate of this modulation spectrum (hence, a float). For example: a signal with a non-sinusoidal modulation at a fundamental frequency of 70 Hz will have peaks at its spectrum at integer harmonics of the fundamental modulation rate (70 Hz, 140 Hz, 210 Hz, etc), thus a possible result for these variables could be f_pi_hat = np.array([70, 140, 210]) and f_p_imax = 70. Having said all that, I think you might be correct: it should be f_p_imax as an argument to _weight_factor_G.

I'll test all of the above modifications and see how much they change the results. Thank you very much for your help, it's much appreciated!

fchirono commented 6 months ago

Hi Salomé,

I tried implementing the changes you suggested, but no luck: if anything, my validation results are now worse! :(

I saw that your validation results are looking much better than mine (great work on that, by the way!). I'm in no hurry to fix my version, so I think I will just wait for your implementation to be finalized and included in MoSQITo to do a more in-depth comparison of our codes and find out what I'm doing wrong.

Thank you once again for your assistance with this!

fchirono / MoSQITo

Roughness ECMA (2nd Ed, 2022) implementation validation #7