MTG / sms-tools

Sound analysis/synthesis tools for music applications
https://www.upf.edu/web/mtg/sms-tools
GNU Affero General Public License v3.0
1.63k stars 751 forks source link

Why the STFT calculated by SMS tool and Librosa is different? #129

Closed xiaozhah closed 2 years ago

xiaozhah commented 2 years ago

Try to analysis the piano.wav in SMS wav directory. parameters is: fft_num = 1024, hop_length = 256, win_length = 1024, window = 'hanning'

This is the plot given by STFT of librosa:

image

This is the plot given by SMS tool:

image

If look carefully, the phase of the harmonica series is vary different, the phase calculated by SMS tool is vary flat, however the one given by librosa has some glitch.

image

The code of generating the librosa plot is the following:

import numpy as np
import matplotlib.pyplot as plt
import os, sys
sys.path.append(os.path.join(os.path.dirname(os.path.realpath(__file__)), '../models/'))
import utilFunctions as UF
import stft as STFT
import librosa

fs, y = UF.wavread('../../sounds/piano.wav')

H = 256
N = 1024
tol = 1e-14 

x = librosa.stft(y, n_fft = N, hop_length = H, win_length = N, window = 'hanning',
    center = True, pad_mode = 'constant')

x = x.T

absX = np.abs(x)
absX[absX<np.finfo(float).eps] = np.finfo(float).eps
mX = 20*np.log10(absX)

x.real[np.abs(x.real) < tol] = 0.0            # for phase calculation set to 0 the small values
x.imag[np.abs(x.imag) < tol] = 0.0            # for phase calculation set to 0 the small values         
pX = np.unwrap(np.angle(x), axis=1)

# create figure to plot
plt.figure(figsize=(9, 6))

# frequency range to plot
maxplotfreq = 5000.0

# plot the input sound
# plot magnitude spectrogram
plt.subplot(2,1,1)
numFrames = int(len(mX[:,0]))
frmTime = H*np.arange(numFrames)/float(fs)
binFreq = fs*np.arange(N*maxplotfreq/fs)/N
plt.pcolormesh(frmTime, binFreq, np.transpose(mX[:,:int(N*maxplotfreq/fs+1)]))
plt.xlabel('time (sec)')
plt.ylabel('frequency (Hz)')
plt.title('magnitude spectrogram')
plt.autoscale(tight=True)

# plot the phase spectrogram
plt.subplot(2,1,2)
numFrames = int(len(pX[:,0]))
frmTime = H*np.arange(numFrames)/float(fs)
binFreq = fs*np.arange(N*maxplotfreq/fs)/N
plt.pcolormesh(frmTime, binFreq, np.transpose(np.diff(pX[:,:int(N*maxplotfreq/fs+1)],axis=1)))
plt.xlabel('time (sec)')
plt.ylabel('frequency (Hz)')
plt.title('phase spectrogram (derivative)')
plt.autoscale(tight=True)
plt.show()

If using the same parameters to analysis the sine-440.wav, the more differnence

image image
xiaozhah commented 2 years ago

The reason is DFT of SMS using fftbuffer and zero phase windowsing and Librosa is not