Eomys / MoSQITo

MoSQITo is a unified and modular development framework of key sound quality metrics favoring reproducible science and efficient shared scripting among engineers, teachers and researchers community.
Apache License 2.0
133 stars 43 forks source link

issue when computing sharpness and loudness #9

Closed annezhangxue closed 3 years ago

annezhangxue commented 3 years ago

Dear @Eomys

I have installed the mosqito 0.1.0 package by ‘pip install’ and I can successfully import mosqito, however I got issues when I tried to compute the sharpness and loudness as shown below:

image

Do you happen to know this issue and could you please kindly advise how to handle it? image

Thank you very much. Best Regards,

mglesser commented 3 years ago

Hi @annezhangxue, I am currently working on a new release of the MOSQITO pip package. It should solve your issue. I will notify you as soon as it will be available (today I hope, maybe tomorrow).

annezhangxue commented 3 years ago

Hello @mglesser Thank you very much for the update. It is working now. I looked through the codes, does the library also support block processing with pre-setting window length when calculating time series of sharpness and loudness? I have audio file with around 9 hours’ recording of traffic noise. It takes very long computation time when simply applying the function of comp_loudness. Thanks.

mglesser commented 3 years ago

Hello @annezhangxue, In Mosqito, the Loudness and sharpness of time-varying signals are computed over successive time segments of 2ms (according to the standard). This time resolution is suited for relatively short time signals. In the case of long recordings of traffic noise, I guess that the signal can be divided in time segment of a few seconds or minutes. The signal in each of those segments can be considered as stationnary and computed with the stationnary version of the loudness and sharpness. Another approach if you consider that the signal in each segment is non stationary is to apply the time varying alogrithm and collect statistical parameters to represent the segment (Average, Min/Max, ...).

annezhangxue commented 3 years ago

Hello @annezhangxue, In Mosqito, the Loudness and sharpness of time-varying signals are computed over successive time segments of 2ms (according to the standard). This time resolution is suited for relatively short time signals. In the case of long recordings of traffic noise, I guess that the signal can be divided in time segment of a few seconds or minutes. The signal in each of those segments can be considered as stationnary and computed with the stationnary version of the loudness and sharpness. Another approach if you consider that the signal in each segment is non stationary is to apply the time varying alogrithm and collect statistical parameters to represent the segment (Average, Min/Max, ...).

Thanks for the answer @mglesser. Are you also working on the fluctuation strength now? and will you also further work on the package of psychoacoustic annoyance metric? Thank you.

mglesser commented 3 years ago

Hello @annezhangxue, two main development are currently ongoing

annezhangxue commented 3 years ago

Hello @mglesser thanks for the update. I am working mainly on deep learning, the applied dataset is from traffic noise, the whole study scope is more or less related to the psychoacoustics, although I don't have acoustic or physics background. Maybe you can have a check on another python package from an EU project: https://github.com/AudioCommons/timbral_models/tree/master/timbral_models They have also calculated some psychoacoustic metrics. Hope it will inspire somehow.

mglesser commented 3 years ago

Hello @annezhangxue, thanks for the link. It's actually inspiring. If you feel that your problem is now solved, I let you close the issue. Otherwise I will be happy to help you further if it is possible.

annezhangxue commented 3 years ago

@mglesser yes, thank you, I close the issue now:-)

annezhangxue commented 3 years ago

Hello @annezhangxue, In Mosqito, the Loudness and sharpness of time-varying signals are computed over successive time segments of 2ms (according to the standard). This time resolution is suited for relatively short time signals. In the case of long recordings of traffic noise, I guess that the signal can be divided in time segment of a few seconds or minutes. The signal in each of those segments can be considered as stationnary and computed with the stationnary version of the loudness and sharpness. Another approach if you consider that the signal in each segment is non stationary is to apply the time varying alogrithm and collect statistical parameters to represent the segment (Average, Min/Max, ...).

Hello, @mglesser can I have one more additional question please? Would you consider the traffic noise as stationary or time-varying signal when processing the psychoacoustic metrics? The Zwicker model N5 loudness applies only to time-varying noise, right? Thank you.

mglesser commented 3 years ago

Hello, Actually the difference between the stationary and time-varying Zwicker model is that the second considers temporal weigthing of the total loudness versus time (to simulate the duration dependent behaviour of loudness perception for short impulses). If your signal is varying rapidly, you'd better use time-varying loudness model and extract the N5 loudness for instance. But if your signal is varying slowly, you can use the stationary loudness on succesive time segments of you're signal (length of each segment to be determined depending on your signal) and extract the 5th percentile of the Loudness of all the segments to get a unique value. This being said, I'm not an expert in traffic noise analysis, you may want to cross check those informations. Best regards, Martin

annezhangxue commented 3 years ago

Hello, Actually the difference between the stationary and time-varying Zwicker model is that the second considers temporal weigthing of the total loudness versus time (to simulate the duration dependent behaviour of loudness perception for short impulses). If your signal is varying rapidly, you'd better use time-varying loudness model and extract the N5 loudness for instance. But if your signal is varying slowly, you can use the stationary loudness on succesive time segments of you're signal (length of each segment to be determined depending on your signal) and extract the 5th percentile of the Loudness of all the segments to get a unique value. This being said, I'm not an expert in traffic noise analysis, you may want to cross check those informations. Best regards, Martin

I see, thank you very much:-) @mglesser