audeering / opensmile

The Munich Open-Source Large-Scale Multimedia Feature Extractor
https://audeering.github.io/opensmile/
Other
578 stars 75 forks source link

Spectral Centroid for white noise has large offset #70

Closed Menrath closed 11 months ago

Menrath commented 11 months ago

When sampling at 48kHz the spectral centroid of a white noise signal should be in theory (also quite intuitively) half of the Nyquist frequency (24kHz in that case) leading to 12kHz.

OpenSMILE calculated a value about 168xxHz. I tested this using the packaged emo_large.conf.

To reproduce one can use this python script:

import pandas as pd
from scipy.io import wavfile
from scipy import stats
import numpy as np
import librosa
import subprocess

# Config path for emo_large.conf (linux)
opensmile_config_emo_large_path = '/usr/share/opensmile/config/misc/emo_large.conf'
opensmile_binary = 'SMILExtract'

# Generate white noise and save it to disk as wav
sample_rate = 48000
length_in_seconds = 4
amplitude = 11
noise = stats.truncnorm(-1, 1, scale=min(2**16, 2**amplitude)).rvs(sample_rate * length_in_seconds)
wavfile.write('noise.wav', sample_rate, noise.astype(np.int16))

# Librosa
audio_file = 'noise.wav'
y, sr = librosa.load(audio_file, sr=sample_rate)
librosa_cog =  librosa.feature.spectral_centroid(y=y, sr=sr)
librosa_mean_cog = np.mean(librosa_cog)
print("Librosa: " + str(librosa_mean_cog))

# Manual
def spectral_centroid(x, samplerate=sample_rate):
    magnitudes = np.abs(np.fft.rfft(x)) # magnitudes of positive frequencies
    length = len(x)
    freqs = np.abs(np.fft.fftfreq(length, 1.0/samplerate)[:length//2+1]) # positive frequencies
    return np.sum(magnitudes*freqs) / np.sum(magnitudes)
manual_mean_cog = spectral_centroid(y, sr)
print("Manual: " + str(manual_mean_cog))

# OpenSMILE
opensmile_command = [opensmile_binary,
                     '-C', opensmile_config_emo_large_path,
                     '-I', 'noise.wav',
                     '-csvoutput', 'noise.csv']
subprocess.call(opensmile_command)
df = pd.read_csv('noise.csv', delimiter=';')
opensmile_mean_cog = df['pcm_fftMag_spectralCentroid_sma_amean']
print("OpenSMILE: " + str(opensmile_mean_cog))

which prints:

Librosa: 12010.02111870055
Manual: 12005.953776257056
OpenSMILE: 16872.99
maxschmitt commented 11 months ago

The issue here is the preemphasis of the signal (higher frequencies are amplified) employed in the emo_large.conf config file, so that the spectrum is not "white" anymore.

If you change line 81 https://github.com/audeering/opensmile/blob/341aea9ce52bc63d7fe75098027a394994f8493b/config/misc/emo_large.conf#L81C1-L81C24 to

reader.dmLevel=frames

you will get the expected result.

Generally, you might want to try also ComParE_2016.conf, which does not use preemphasis and has been updated more recently: https://github.com/audeering/opensmile/blob/master/config/compare16/ComParE_2016.conf