audeering / opensmile-python

Python package for openSMILE
https://audeering.github.io/opensmile-python/
Other
240 stars 32 forks source link

Scale of the loudness feature in the eGeMAPS set #77

Open YangLiyli131 opened 1 year ago

YangLiyli131 commented 1 year ago

Hello, I'm using this package to extract the loudness of audio files following the eGeMAPSv02 feature set ('loudness_sma3'). The values it returns me are very small values close to one. I'm curious what is the scale/unit of this feature and how to transform it to dB? Thank you.

dattilson commented 1 year ago

Well, from my (admittedly limited understanding), according to the GeMAPS paper: https://ieeexplore.ieee.org/document/7160715

"Loudness is used here as a more perceptually relevant [62] alternative to the signal energy. In order to approximate humans’ non-linear perception of sound, an auditory spectrum as is applied in the Perceptual Linear Prediction (PLP) technique [63] is adopted. A non-linear Mel-band spectrum is constructed by applying 26 triangular filters distributed equidistant on the Mel-frequency scale from 20–8000 Hz to a power spectrum computed from a 25 ms frame. An auditory weighting with an equal loudness curve as used by [63] and originally adopted from [64] is performed. Next, a cubic root amplitude compression is performed for each band b of the equal loudness weighted Mel-band power spectrum [63]. resulting in a spectrum which is referred to as auditory spectrum. Loudness is then computed as the sum over all bands of the auditory spectrum."

PLP technique I believe refers to https://pubs.aip.org/asa/jasa/article/87/4/1738/930759/Perceptual-linear-predictive-PLP-analysis-of